Skip to content

CREST Frequently Asked Questions

Jacob Burnim edited this page Feb 5, 2014 · 1 revision

Can CREST be used under Cygwin?

Yes, but a few changes are necessary to build CREST under Cygwin.

In particular, addr_t cannot be used as a typedef under Cygwin, so all uses of addr_t must be renamed. This can be done with sed:

sed -i -e 's/addr_t/addr2_t/g' src/*/*

Some users have reported that id_t must also be renamed.

Can CREST be used on multi-file programs?

Yes. The easiest way to do this is to use CIL to merge all of the source files into a single file, and then run crestc on the merged file.

For further information on merging souce files with CIL, see (from the CIL documentation): Using the merger.

As an example, for GNU grep 2.2, it is sufficient to patch src/Makefile with (after running configure):

45c45
< AUTOMAKE = automake
---
> AUTOMAKE = echo
63c63
< CC = gcc
---
> CC = cilly
116c116
< LDFLAGS = 
---
> LDFLAGS = --merge --keepmerged
133c133
< CFLAGS = -g -O2
---
> CFLAGS = --merge -g -O2

and then to simply run make in the src/ directory. This will produce a combined source file grep_comb.c, as well as executable grep.

(Changing AUTOMAKE should not be necessary, but I cannot get this to work on my machine without this change.)

Does CREST save the exact set of branches that it covers during testing?

Yes. These branches are written to the file coverage in the working directory. The file is a list of branch identifier numbers, one per line.

Unfortunately, it is somewhat difficult to match these identifier numbers back to branches in the program under test. Currently, the only way to do this is to find the branch ID numbers in the CIL-instrumented source for the test program. (When run on a file test.c, crestc produces an instrumented source file test.cil.c.) Every branch in the instrumented file will contain an instrumentation call:

__CrestBranch(statement-id, branch-id, true/false);

For example, the call

if (a == 0) {
  __CrestBranch(47, 18, 1);
  ...

indicates that this branch has identifier 18 (and is the "true" branch of the conditional if (a == 0)).

Does CREST save the test inputs it generates?

No. CREST does not currently save the inputs it generates during testing.

If you need this feature, it can be implemented by adding only a few lines of code to src/run_crest/concolic_search.cc. In method Search::RunProgram(), just before the call to LaunchProgram(), add:

  // Save the given inputs.
  char fname[32];
  snprintf(fname, 32, "input.%d", num_iters_);
  WriteInputToFileOrDie(fname, inputs);

How do I run CREST on the GNU grep example?

Once you have compiled/instrumented grep, use:

  run_crest './grep aaaaaaaaaa /dev/null' 100 -random

This runs grep with a regexp/pattern of 10 symbolic characters. The output should look something like:

  Iteration 0 (0s): covered 0 branches [0 reach funs, 0 reach branches].
  RESET

  Iteration 1 (0s): covered 446 branches [40 reach funs, 2448 reach branches].
  Solved 690/1455


  Iteration 2 (0s): covered 459 branches [40 reach funs, 2448 reach branches].
  Solved 10/1122
  ...

(We modified grep so that it makes the regular expression symbolic, but it still expects grep to be run with a regexp argument to specify the length. E.g., above "aaaaaaaaaa" indicates a symbolic regexp of length 10.)

Similarly, although no characters will be read from the given file -- symbolic characters will be read instead -- a file argument must still be given. And grep must be able to open this file. /dev/null should always work.)

What do the numbers in the branches file mean?

When crestc instruments a C file, a file named branches is produced. The branches file contains the ID numbers for each branch in the instrumented program.

The branches file consists of one or more function entries, which lists the branches contained in one function/procedure. Each file entry has the format:

function_id N
b1_then b1_else
b2_then b2_else
...
bN_then bN_else

The first line of an entry contains a function ID number and the number N of remaining lines in the function entry -- N is the number of conditional statements in the corresponding function. The N remaining lines each contain two branch ID's: one for the then/true branch of a conditional statement and one for the else/false branch of the conditional statement.

For example, for the file:

void abs(int x) {
    if (x < 0) {
        return -x;    // ID 1
    } else {
        return x;    // ID 2
    }
}

int main(void) {
    int a;
    CREST_int(a);

    if (a < 0) {
        // ID 3
        if (abs(a) == a) {
            printf("ERROR");  // ID 4
        else {
            // ID 5
        }
    } else {
        // ID 6
    }

    return 0;
}

running crestc will yield a branches file with two function entries:

1 1    [start of entry 1]
1 2
2 2    [start of entry 2]
3 6
4 5

What does the output of crestc mean?

... omitted ...
Read 90 branches.
Read 232 nodes.
Wrote 145 branch edges.

The three numbers output by crestc are:

  1. "Branches": The number of conditional branches in the program. A branch is the true or false side of a conditional statement, so this number is roughly twice the number of "if" statements in a program. (Other control flow, such as loops and "switch" statements, are translated in goto's and "if" statements.) For example, the following program has 4 branches:
1:  if (x < 10) {
2:      printf("Less than 10.\n");    // branch 1
3:  } else {
4:      // branch 2
5:  }
6:  while (y != 0) {
7:      y--;    // branch 3
8:  }
9:  return 0;    // branch 4  [when y == 0]
  1. "Nodes": This is the number of vertices in the program's control-flow graph.

  2. "Branch edges": The control-flow graph is refined by removing all nodes that are not branches (and adding edges so that this graph is properly connected). The number of "branch edges" is the number of edges remaining in this refined graph.

For example, for the above program, the refined CFG has only 4 nodes: lines 2, 4, 7, and 9, with edges (2, 7), (2, 9), (4, 7), (4, 9), (7, 9). (More details on this transformation and the reasons for doing it can be found in our ASE paper: Burnim, Sen, "Heuristics for Scalable Dynamic Test Generation", ASE 2008.)

What does the output of run_crest mean?

... omitted ...
Iteration 3 (0s): covered 5 branches [1 reach funs, 88 reach branches].
Iteration 4 (0s): covered 23 branches [1 reach funs, 88 reach branches].
... omitted ...

Each iteration in run_crest consists of one run of the program under test. In each iteration, the number of covered branches is simply the number of unique branches (as defined above) that have been encountered in any execution of the program under test from one of the iterations so far during this run of run_crest.

To (very roughly) estimate the number of program branches that are reachable, we also report the number of unique functions from the program that have been encountered during concolic testing, and the sum of all of the branches in each of those functions. (We use this feature for testing larger programs like Vim, with thousands of functions. Depending on build and command-line options, many of these functions are not reachable and thus their branches should not be counted (in the denominator) when computing the coverage.