Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeating easy precondition increases verification time #637

Open
bobismijnnaam opened this issue Sep 1, 2022 · 8 comments
Open

Repeating easy precondition increases verification time #637

bobismijnnaam opened this issue Sep 1, 2022 · 8 comments

Comments

@bobismijnnaam
Copy link

In the below file, if a certain precondition is repeated, verification becomes slower with every added precondition, up to the point that the viper ide times out. Carbon verifies it in one or two seconds.

This precondition should be straightforward to derive from the context. The file contains only one implemented function, which is more or less trivial, so I would expect silicon to verify it in at most a few seconds. Instead verification takes 10 seconds, and verification time can easily be increased to 100+ seconds by duplicating preconditions.

There is a function called "matrixValues", which requires a quantified permission. If that requirement is commented, verification time drops below one second again. So I think it might have to do with the heavier encoding for heap-dependent functions?

I also looked at the smt2 file; it seems to spend the most time on establishing the "NonNegativeCapacities" precondition in "SumIncomingFlow", given that "NonNegativeCapacities" is required in "maxFlow". The more often z3 checks this fact, the longer verification time becomes. I would expect this to be trivial after it is established once, but apparently I am either misunderstanding and this is hard, or z3 is accidentally doing more work than is needed.

I would like to investigate further, but unfortunately I am out of ideas, and I also don't think the example can be made much smaller without thorough understanding of viper. If you could give me pointers on what else I could try I'd be happy to keep looking.

@mschwerhoff
Copy link
Contributor

Before getting into details: the QP assertions in matrixValues and maxFlow don't have triggers. That's hardly ever good.

Generally, Silicon symbolically evaluates each Viper expression (assertion), i.e. translates it from Viper to SMT. If they are heap-dependent, this may introduce new heap snapshots, representing footprints of involved (sub)expressions. Syntactically equal expressions might translate to syntactically different SMT code because of syntactically different (-ly named) snapshots. The latter may be semantically equivalent, but figuring this out often involves instantiating the corresponding definitional axioms — and until semantic equivalence has been established, each syntactically different SMT term may cause further quantifier instantiations. Silicon does some internal caching to reduce redundant expression evaluation, but the caching heuristics have to be very simple to not introduce too much overhead themselves.

In your case, repeatedly evaluation matrixValues(...) may introduce many such snapshots, and repeated use of wildcard may additionally introduce work for the non-linear arithmetic subsolver. You could try to narrow down suspects by

  1. replacing wildcards by fixed fractions (and leaving the rest of the program unchanged)
  2. additionally or orthogonally pulling out matrixValues(G, V) as follows (sort of a let-binding for debugging purposes):
    1. add a fourth parameter mv: Seq[Seq[Int]] to maxFlow
    2. add precondition mv == matrixValues(G, V)
    3. replace each matrixValues(G, V) in the remaining preconditions by mv

Let me know if this yields any further insights into the problem.

@bobismijnnaam
Copy link
Author

Thank you for your explanation! I will have another look!

@ArmborstL
Copy link

FWIW, last week I already tried replacing the wildcard with fixed fractions, as well as adding some manual (naive) triggers. Neither seemed to fix the problem. A first look by Bob after your comment seems to indicate that it really comes down to the snapshots. I guess we'll have to see what lessons to take from that (e.g. in terms of "do's and don't's" for users).

@bobismijnnaam
Copy link
Author

bobismijnnaam commented Sep 9, 2022

To expand upon that: after applying the let/equality rewrite, verification time on silicon becomes quick again (around 6s on my laptop, and doubling the contract makes no noticeable difference).

The snapshots of matrixValues is different for each appearance in the contract:

(assert (Seq_equal
  (matrixValues ($Snap.combine
    $Snap.unit
    ($SortWrappers.$FVF<array>To$Snap (as sm@18@01  $FVF<array>))) G@6@01 V@7@01)
  (matrixValues ($Snap.combine
    $Snap.unit
    ($SortWrappers.$FVF<array>To$Snap (as sm@23@01  $FVF<array>))) G@6@01 V@7@01)))

Through earlier defined foralls the sm@... values are related back to the method input snapshot $t@10@01. Boogie seems to reuse the same snapshot: I guess boogie doesn't use snapshots because it uses an entirely different encoding... But it's maybe still worth pointing out that in that case there is syntactic equality:

assume Seq#Equal(matrixValues(Heap, G, V), matrixValues(Heap, G, V));

Since the equality of matrixValues is being assumed (meaning, it's not negated before the check-sat), I don't see yet how differing snapshots can cause longer verification times. Instead I feel like it should be something about the precondition of matrixValues that causes long verification times. The fact that the long verification time also happens without the use of wildcard is peculiar. Even the verification time for 1/1 or 1/2 permission is the same.

Now that matrixValues seems to cause the problem, I minimized the files a bit further.

I guess at this point I should start minimizing the smt2 and put that in axiom profiler to see what's going on...?

@bobismijnnaam
Copy link
Author

bobismijnnaam commented Sep 9, 2022

I think the culprit is the following expression in the smt output:

; Chunk depleted?
(set-option :timeout 500)
(push) ; 3
(assert (not (forall ((r $Ref))
  (=
    (-
      (ite
        (and (< (inv@13@01 r) V@7@01) (<= 0 (inv@13@01 r)))
        (/ (to_real 1) (to_real 2))
        $Perm.No)
      (pTaken@1062@01 r))
    $Perm.No))))
(check-sat)
; unknown

It appears a lot in the smt output. I watched the smt file being written with tail -f. Verification starts out quick and the output is not really readable. But after a while the time spent on each ; Chunk depleted? check begins to increase. Up to a duration of 500ms, where it times out. Should the timeout be lower? Or could there be a reason the chunk depleted check is taking so long, which we can maybe solve?

(This is all based on the viper file I posted above, except with the precondition of maxFlow duplicated 100 times to make the effect more easily measurable and visible in the smt2)

@mschwerhoff
Copy link
Contributor

You could dump Z3 statistics between check-sat calls to see which numbers increase, and from that get an idea of which subsolvers uses up the time bound. However, in my experience, interpreting Z3 statistics is close to impossible — and moreover, there is still quite a gap between knowing the culprit theory (probably quantifiers) and reducing its workload.

The chunk depletion checks are a two-sided sword: given many chunks, they save time if the first few provide sufficient permission, but they cost time if the permission removal loop cannot be exited prematurely. You could simply remove the checks from Silicon's codebase and see if that helps. If so, I'd be happy to accept a PR with a command-line option for disabling the checks.

Regarding the timeout: lowering it reduces the chance of detecting that the removal loop can stop early, but it saves time if the loop cannot be exited prematurely anyway. The 500ms were empirically obtained, and a good compromise at some point during Viper's/Silicon's development ... but other values might be better now, or for specific use-cases. It's a dark craft, really :-)

@mschwerhoff
Copy link
Contributor

A comment regarding your quantifier

forall a2: array :: true ==> (forall i1: Int :: { array_loc(a2, i1) } true ==> ...) 

Given the nested quantifier forall x :: forall y :: f(x,y), the outer one does not admit any meaningful trigger. I'm not sure if Z3 internally rewrites such quantifiers and then picks f(x,y), but I would not rely on it. Instead, un-nest the quantifiers to forall x, y :: f(x,y).

@bobismijnnaam
Copy link
Author

Thanks again for your quick response!

I dumped z3 statistics inbetween chunk duplicated check-sat's, but the numbers seemed to increase only a little bit, and pretty evenly spread across most categories at that. So I don't think there are good hints there.

I tried putting my z3.log's into axiom-profiler and z3-tracer. Axiom-profiler hangs as soon as I duplicate the precondition more than once. z3-tracer accepts the log but prints an almost trivial quantifier instantiation graph (just 5 disconnected nodes). All other graphs that z3-tracer can produce don't look to bad I think. So either there's not much interesting happening quantifier-wise or z3-tracer is not reading the log correctly.

While looking through the smt2 the following two parts caught my eye. This one is fairly early on in the file (included below):

(assert (Seq_equal
  (mv ($Snap.combine
    $Snap.unit
    ($SortWrappers.$FVF<array>To$Snap (as sm@48@01  $FVF<array>))) G@6@01 V@7@01)
  (mv ($Snap.combine
    $Snap.unit
    ($SortWrappers.$FVF<array>To$Snap (as sm@53@01  $FVF<array>))) G@6@01 V@7@01)))
(assert (=
  ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second $t@10@01))))))
  ($Snap.combine
    ($Snap.first ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second $t@10@01)))))))
    ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second $t@10@01))))))))))
(assert (=
  ($Snap.first ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second $t@10@01)))))))
  $Snap.unit))

And then at the end of the file:

(assert (Seq_equal
  (mv ($Snap.combine
    $Snap.unit
    ($SortWrappers.$FVF<array>To$Snap (as sm@588@01  $FVF<array>))) G@6@01 V@7@01)
  (mv ($Snap.combine
    $Snap.unit
    ($SortWrappers.$FVF<array>To$Snap (as sm@593@01  $FVF<array>))) G@6@01 V@7@01)))
(assert (=
  ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second $t@10@01))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
  ($Snap.combine
    ($Snap.first ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second $t@10@01)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
    ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second $t@10@01))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
(assert (=
  ($Snap.first ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second ($Snap.second $t@10@01)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
  $Snap.unit))

Inbetween those two parts, the structure seems to repeat every 500 lines or so, and grows bigger each subsequent time. It's not very useful either, as I think it concludes every single time that some huge snapshot term is equal to unit. Could this maybe be related to the slow chunk depleted checks?

logfile-01.smt2.txt (acquired from the viper file I posted above, by repeating the precondition a whole bunch of times)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants