# Xiaohong's Research Diary

## June 4, 2017

I am looking for a good way of implementing context reasoning.

## May 28, 2017

The ml2fol translation can solve some problems, but not all of them. In fact, it fails to prove some very basic properties. So now I (have to) turn to work on the interactive prover.

By the way, it looks like we can use machine learning to build a smarter automated theorem prover.

## May 18, 2017

A simpler example in which Z3 fails to prove 0 + 1 = 1:

`(declare-sort Nat) (declare-fun zero () Nat)(declare-fun succ (Nat) Nat)(declare-fun plus (Nat Nat) Nat) (assert (forall ((x Nat))   (= x (plus x zero)))) (assert (forall ((x Nat) (y Nat))   (forall ((\$6 Nat))     (= (= \$6 (succ (plus x y)))        (= \$6 (plus x (succ y))))))) (assert (not   (= (succ zero)      (plus zero (succ zero))))) (check-sat)`

## May 17, 2017

How to trick Z3 so that it can't prove 1 + 1 = 2:

`(declare-sort Nat) (declare-fun zero () Nat)(declare-fun succ (Nat) Nat)(declare-fun plus (Nat Nat) Nat) ; x + 0 = x(assert (forall ((x Nat) (\$2 Nat))   (= (= \$2 x)      (exists ((\$3 Nat) (\$4 Nat))        (and (= \$2 (plus \$3 \$4))             (= \$3 x)             (= \$4 zero)))))) ; x + succ(y) = succ(x + y)(assert (forall ((x Nat) (y Nat) (\$6 Nat))   (= (exists ((\$7 Nat) (\$8 Nat))        (and (= \$6 (plus \$7 \$8))             (= \$7 x)             (exists ((\$9 Nat))               (and (= \$8 (succ \$9))                    (= \$9 y)))))      (exists ((\$10 Nat))        (and (= \$6 (succ \$10))             (exists ((\$11 Nat) (\$12 Nat))               (and (= \$10 (plus \$11 \$12))                    (= \$11 x)                    (= \$12 y)))))))) ; negation of 1 + 1 = 2(assert (not (forall ((\$14 Nat))   (= (exists ((\$15 Nat) (\$16 Nat))        (and (= \$14 (plus \$15 \$16))             (exists ((\$17 Nat))               (and (= \$15 (succ \$17))                    (= \$17 zero)))             (exists ((\$18 Nat)) (and (= \$16 (succ \$18)) (= \$18 zero)))))      (exists ((\$19 Nat))        (and (= \$14 (succ \$19))             (exists ((\$20 Nat))               (and (= \$19 (succ \$20))                    (= \$20 zero))))))))) (check-sat)`

## May 16, 2017

I have had a question about writing good test cases for a long time: whether test cases (of correctness testing) should be written against implementation or specification.

One good (and real) example is using lists to implement sets. Sets are lists that don't have duplicates in them. Now suppose a programmer implements the set_union function as follows.

`let set_union list1 list2 =  delete_duplicates (list1 @ list2)`

where @ is the "append" of two lists. Then the programmer writes the following test case for set_union

`assert_equal [1;2;3] (set_union [1;2] [2;3])`

and, of course, the test case passes, and the programmer happily continues to work on other things.

But wait, this test case expects that set_union respects the order of elements, which is stronger than the specification of the union of two sets.

## May 15, 2017

It is really exciting to see that the ml2fol conversion works for the list example: given axioms for naturals, sequences, and maps, Z3 (see code below) is able to show that (7 |-> 9 * 8 |-> 1 * 2 |-> 0 * 1 |-> 5) => list(7, [9; 5]).

`; Z3 code; The list example (declare-sort Nat)(declare-sort NatSeq)(declare-sort Map) ; Natural numbers (declare-fun zero () Nat)(declare-fun succ (Nat) Nat) ; define some constants for the sake of readability(declare-const one Nat)(declare-const two Nat)(declare-const five Nat)(declare-const seven Nat)(declare-const eight Nat)(declare-const nine Nat)(assert (= one (succ zero)))(assert (= two (succ one)))(assert (= five (succ (succ (succ two)))))(assert (= seven (succ (succ five))))(assert (= eight (succ seven)))(assert (= nine (succ eight))) ; succ is injective(assert (forall ((X Nat) (Y Nat))  (= (= X Y) (= (succ X) (succ Y))))) (assert (forall ((X Nat))  (not (= X (succ X))))) ; Sequence of Nat (declare-fun epsilon () NatSeq); the "cons" version of concat(declare-fun cncat (Nat NatSeq) NatSeq) (assert (forall ((X Nat) (S NatSeq))  (not (= (cncat X S) S)))) (assert (forall ((X1 Nat) (X2 Nat) (S1 NatSeq) (S2 NatSeq))  (= (= (cncat X1 S1) (cncat X2 S2))     (and (= X1 X2) (= S1 S2))))) ; Map (Heap) (declare-fun emp () Map) ; mapsto: Nat Nat -> Map is a partial function; mapsto(X, Y) is often written as X |-> Y(declare-fun total_mapsto (Nat Nat) Map)(declare-fun delta_mapsto (Nat Nat) Bool) ; mapsto(zero, Y) = bottom(assert (forall ((X Nat))  (not (delta_mapsto zero X)))) ; exists T . mapsto(succ(X), Y) = T(assert (forall ((X Nat) (Y Nat))  (delta_mapsto (succ X) Y))) ; mapsto(succ(X1), Y1) = mapsto(succ(X2), Y2) iff X1 = X2 and Y1 = Y2(assert (forall ((X1 Nat) (Y1 Nat) (X2 Nat) (Y2 Nat))  (= (= (total_mapsto (succ X1) Y1) (total_mapsto (succ X2) Y2))     (and (= X1 X2) (= Y1 Y2))))) ; merge: Map Map -> Map is an associative and commutative partial function ; with emp as the identity.(declare-fun total_merge (Map Map) Map)(declare-fun delta_merge (Map Map) Bool) ; commutativity(assert (forall ((H1 Map) (H2 Map))  (= (delta_merge H1 H2) (delta_merge H2 H1)))) (assert (forall ((H1 Map) (H2 Map))  (=> (or (delta_merge H1 H2) (delta_merge H2 H1))      (= (total_merge H1 H2) (total_merge H2 H1))))) ; associativity(assert (forall ((H1 Map) (H2 Map) (H3 Map))  (= (and (delta_merge H1 H2) (delta_merge (total_merge H1 H2) H3))     (and (delta_merge H2 H3) (delta_merge H1 (total_merge H2 H3)))))) (assert (forall ((H1 Map) (H2 Map) (H3 Map))  (=> (or (and (delta_merge H1 H2) (delta_merge (total_merge H1 H2) H3))          (and (delta_merge H2 H3) (delta_merge H1 (total_merge H2 H3))))      (= (total_merge (total_merge H1 H2) H3)         (total_merge H1 (total_merge H2 H3)))))) ; identity(assert (forall ((H Map))  (delta_merge emp H))) (assert (forall ((H Map))  (= H (total_merge emp H)))) ; merge(mapsto(X, Y), mapsto(X, Z)) = bottom(assert (forall ((X Nat) (Y Nat) (Z Nat))  (or (not (delta_mapsto X Y))      (not (delta_mapsto X Z))      (not (delta_merge (total_mapsto X Y) (total_mapsto X Z)))))) ; mapstoseq: Nat NatSeq -> Map is a partial function(declare-fun total_mapstoseq (Nat NatSeq) Map)(declare-fun delta_mapstoseq (Nat NatSeq) Bool) ; mapstoseq(X, epsilon) = emp(assert (forall ((X Nat))  (delta_mapstoseq X epsilon))) (assert (forall ((X Nat))  (= emp (total_mapstoseq X epsilon)))) ; mapstoseq(X, cncat(Y, S)) = merge(mapsto(X, Y), mapstoseq(succ(X), S))(assert (forall ((X Nat) (Y Nat) (S NatSeq))  (= (delta_mapstoseq X (cncat Y S))     (and (delta_mapsto X Y)          (delta_mapstoseq (succ X) S)          (delta_merge (total_mapsto X Y) (total_mapstoseq (succ X) S)))))) (assert (forall ((X Nat) (Y Nat) (S NatSeq))  (=> (or (delta_mapstoseq X (cncat Y S))          (and (delta_mapsto X Y)          (delta_mapstoseq (succ X) S)          (delta_merge (total_mapsto X Y) (total_mapstoseq (succ X) S))))      (= (total_mapstoseq X (cncat Y S))         (total_merge (total_mapsto X Y) (total_mapstoseq (succ X) S)))))) ; list: Nat NatSeq -> Map is a symbol(declare-fun pi_list (Nat NatSeq Map) Bool) ; list(X, epsilon) = emp /\ X = 0(assert (forall ((X Nat) (H Map))  (= (pi_list X epsilon H)     (and (= X zero) (= H emp))))) ; list(X, cncat(Y, S)) = exists Z . merge(mapstoseq(X, cncat(Y, cncat(Z, epsilon))), list(Z, S))(assert (forall ((X Nat) (Y Nat) (S NatSeq) (H Map))  (= (pi_list X (cncat Y S) H)     (exists ((Z Nat) (H1 Map))       (and (delta_mapstoseq X (cncat Y (cncat Z epsilon)))            (pi_list Z S H1)            (delta_merge (total_mapstoseq X (cncat Y (cncat Z epsilon))) H1)            (= H (total_merge (total_mapstoseq X (cncat Y (cncat Z epsilon))) H1))))))) ; Proof obligation; 7 |-> 9 * 8 |-> 1 * 2 |-> 0 * 1 |-> 5) => list(7, [9; 5]); Negate it and expect unsat(assert (not  (=> (and (delta_mapsto one five)           (delta_mapsto two zero)           (delta_mapsto seven nine)           (delta_mapsto eight one)           (delta_merge (total_mapsto seven nine)                         (total_mapsto eight one))           (delta_merge (total_mapsto two zero)                        (total_merge (total_mapsto seven nine)                                     (total_mapsto eight one)))           (delta_merge (total_mapsto one five)                        (total_merge (total_mapsto two zero)                                     (total_merge (total_mapsto seven nine)                                                  (total_mapsto eight one)))))      (pi_list seven                (cncat nine (cncat five epsilon))               (total_merge (total_mapsto one five)                            (total_merge (total_mapsto two zero)                                         (total_merge (total_mapsto seven nine)                                                      (total_mapsto eight one)))))))) (check-sat)`

## May 14, 2017

I haven't been updating this diary for a time, partly because my 11-credit coursework in the last semester was too heavy, and partly because writing formulas and codes on wiki is painful. Nevertheless, I will restore writing down my research diary from now on.

I am currently working on a conversion from matching logic to first-order logic, with a special way to encode functions and partial functions. The way I deal with functions is quite simple, while the way how we deal with partial functions seems a bit interesting. There are huge literatures on partial first-order logic and partial algebras, and some of them use quite tedious notations and don't get an elegant result. In a word, it is tricky and not trivial at all. On the other hand, the conversion that we found cannot be more intuitive. For any partial functional symbol f in matching logic, we introduce in first-order logic a (total) function total_f and a predicate symbol def_f, and define the following conversion rule

`f(x1,...,xn) => def_f(x1,...,xn) /\ total_f(x1,...,xn).`

I am going to implement this in the ml2fol converter in OCaml. But still I have to implement a prototype for matching logic prover in Maude, in a smart(er) way I suppose.

## Jan 02, 2017

One cannot define "=" as a symbol in matching logic. Lots of intricate staffs. I am very impressed.

## Dec 10, 2016

Classic Mistakes in Testing says that "A tester's itch to start breaking the program is as strong as a programmer's itch to start writing code - and it has the same effect: design work is skimped, and quality suffers." I understand that itch. So is there a way to help programmers write their design down?

A good quote: "Writing is a wonderful human invention: use it.".

## Oct 18, 2016

I had a long discussion with Daejun in the afternoon. We found a very subtle and tricky thing about the Boolean expressions in matching logic. The nonlogical symbol "and" for Boolean expression is different from the built-in one , which will be interpreted with intersection of sets. Such difference will make a big difference when they appear together in a formula, and that often causes lots of confusion.

## Oct 14, 2016

Berdine et al studied in their FSTTCS'04 paper what they called A Decidable Fragment of Separation Logic. The heap assertions are not allowed to overlap, and only one data structure of linked lists is concerned. Over such constraints, a complete proof system can be shown in a very nontrivial way. The most interesting fact is that one can reduce the list predicate by only considering two cases. One is when the list is an empty list, and the other is when the list contains only two elements. I haven't get the point very clearly, but it seems to be the breaking point of the paper.

## Oct 12, 2016

I had some more words to say about the reachability property in the first order graph theory. The first order graph theory is a first order language L with a new relation symbol E that is interpreted as the edge relation of a graph. The theory contains one nonlogical axiom

`∀x.¬E(x,x)`

that forbids self-loops. One wants to define reachability Reach(x,y)in our formal theory, which could be inductively defined as

`Reach0(x,y) ≡ x = yReach1(x,y) ≡ ∃z.E(x,z) ∧ Reach0(z,y)Reach2(x,y) ≡ ∃z.E(x,z) ∧ Reach1(z,y)...Reach (x,y) ≡ ∃n.Reachn(x,y)`

Unfortunately, Reach(x,y) defined in this way is not a legal well formed formula. In fact, there does not exist a formula φ(x,y) with two free variables x and y such that a model G satisfies φ in an interpretation that maps x to a node u and y to a node v in G, whenever v is reachable from u in G.

Assume there exist such a formula φ(x,y), then we build the following three formulas

`ψ1 ≡ ∀x∀y.φ(x,y)                                   /* G is strongly connected */ψ2 ≡ ∀x∃y.E(x,y) ∧ ∀x∀y∀z.(E(x,y)∧E(x,z)→(y=z))   /* For any node, there is one outgoing edge */ψ3 ≡ ∀x∃y.E(y,x) ∧ ∀x∀y∀z.(E(y,x)∧E(z,x)→(y=z))   /* For any node, there is one ingoing  edge */`

Therefore, the conjunctive formula ψ ≡ ψ1 ∧ ψ2 ∧ ψ3 means the graph G is a circle. Every circle is a model of ψ, and circles can get arbitrarily large, so there must exist an infinite model (based on a theorem in logic), and leads to the contradictory that there does not exist an infinite circle.

## Oct 7, 2016

I had a fruitful discussion with Grigore today. The most interesting thing I learned is what we mean by saying a logic is sound and complete. Suppose we have a formal language L in which we have some logical axioms comes with it. In many cases, we want to prove such a logic is sound and complete, and once we done that, we think we are fine. Because the logic is complete, then for any formula P,

`|- P iff |= P`

The fact, however, is that whenever we use this formal language in practice, such completeness does not help us a lot, because we will be working with some model M of the logic, who may have arbitrarily complex properties, say, one of which is Q. The property Q will not, as it should not, be captured by our logic L, because Q is so unique that it only belongs to the real world M that we are working in. As a result, even if our logic is complete, we will not be able to give Q a formal proof.

## Oct 5, 2016

It was interesting to know that in first order graph theory, some P problems cannot be represented as first order formulas. One such typical example is the readability problem, which asks whether a vertex u can reach the other vertex v in a directed graph. That is part of the reason why there were some researchers studied a decidable fragment of some logic .

I was not very sure why specifications need to be reusable, as is suggested in runtime verification .

## Oct 4, 2016

I was looking for papers on separation logic provers because I want to have a clearer picture of what a matching logic prover should be like. I soon found a milestone paper by Berdine et al. in 2004 which proposed an aim to "provide algorithms and tools to transfer the simplicity of handwritten proofs with separation logic to an automatic setting".

## Oct 3, 2016

When I went through the tutorial of K, I got interested in the lambda calculus and found a very concise introduction by Barendregt and Barendsen. There are a couple of notation abbreviation conversion that I obviously need to take care of, but the first nontrivial thing is the fixed point theorem, which says that for any F there exists an X = Y F such that F X = X. The lambda term Y plays a similar role as an infinite while-loop, that is,

`Y F = F(F(F(...))).`

The Y term plays an important role in recursively defining things.

Dr. David Garlan gave a talk on self-adaptive system, which seems to be a control system over software projects. The idea is to get rid of ad-hoc low level programming techniques including exceptions, timeouts, etc., and replace then with an integrated "smart" control mechanism (a control panel) that senses and affects the executing system.

## Sep 25, 2016

Ciobaca et al. shows in their A Language-Independent Proof System for Full Program Equivalence that a specification of a piece of code is a reachability rule.

## Sep 20, 2016

I reviewed some basic concepts in first-order logic. I believe it is always a good practice to use a word when you are 100% sure what it means.

A first order language consists of two categories of symbols. There are logical symbols that include variables, logical connectives, logic quantifiers and the equality symbol. There are also nonlogical symbols that consist of the signature of the language. Nonlogical symbols contain constant symbols, function symbols and relation symbols. If we pick some formula in the language and make them axioms, then we will have a first-order theory.

A structure or interpretation of the language contain a universe as the domain where variables are interpreted. It also associates constant symbols, function symbols and relation symbols with constants, functions and relations over the domain.

When it comes to separation logic, things become more complicated because we have one sort of variables that are interpreted as integers, and another sort of variables that are interpreted as heaps. Therefore, the first-order language that we use to represent separation logic is a two-sorted languages, in which we have both registers variables and heap variables. An assertion in separation logic will have to be transformed to be considered as a well-formed formula in the two-sorted first order language.

## Sep 19, 2016

I reviewed some basic concepts in mathematical logic. An inference rule

`premise-------conclusion`

simply means that if (under an interpretation) the premise holds, then the conclusion holds. In other words, the conclusion of any rule of inference (should) be a tautological consequence of its premises. Reynolds' posted such an inference rule in the separation logic paper that says

`p1 => p2  q1 => q2------------------p1 * q1 => p2 * q2`

which is not valid unless we provide more restrictions on p1, p2, q1, q2 .

## Sep 16, 2016

I found a couple of technical mistakes in Reynolds' Separation Logic: A Logic for Shared Mutable Data Structures. Besides that, the paper indeed propose an interesting example and show a sound motivation. The example is the following algorithm that reverse a link list.

`j := nil;while i != nil do  i, j, [i+1] := [i+1], i, j;`

We can easily guess an invariant candidate for the while-loop, but in order to prove that the candidate is indeed an invariant, we need a formal definition of what we mean by saying "i points to a list". Further, we need to specify that i and j will not reach the same fragment of memory except nil. Such sanity conditions are essential to prove the invariant condition but are tedious and not scalable. That is why we want to have separation of concern when reasoning programs that manipulate data structure in a dynamic way.

## Sep 12, 2016

I had a discussion with Grigore about why and how matching logic is a game changer. Take Hoare logic as an example. Hoare logic has many good properties (such as compositional reasoning) because the semantic model of the logic, i.e., the IMP language itself is compositional reasonable. And IMP is a very simple programming language. If we want something similar in other programming language, for example, Java, we need to manipulate and modify the Hoare logic to some “JHoare logic”. Similarly, separation logic has a semantic domain of IMP with a heap. Separation logic endorses the frame rule because IMP with a heap endorses the frame rule. If we want to use separation logic in other languages, say, Java, we need to modify the logic to some "JSepaLogic".

What matching logic does is to forget all J-xxx staff. There is only one logic called matching logic. If we are using IMP with a heap, then the semantics of the language should make us able to proof a (meta-)theorem that serves as the frame rule. If we are using Java, then the semantics of Java should make us able to proof a (meta-)theorem that serves as the frame rule, or other interesting rules. If we are using some odd languages that do not satisfy the frame rule, we still might proof some other interesting (meta-)theorem about its nature. One logic, any semantics.

## Sep 11, 2016

I personally found O'Hearn's Separation Logic Tutorial very useful as a link source to other papers. The second section "Model theory and proof theory" contains many interesting papers on separation logic and more. From there, I found some papers by Yang discussing the so-called "local reasoning", which highlights the difference between specifications in separation logic and the Hoare triples. The main difference seems to be the main contribution that separation logic brings to us, that is, to overcome the frame problem. Frame problem is stated informally as follows. In mathematical logic, especially first-order logic, it is easy to define an action by giving its pre- and post-conditions. An action might change one or some aspects of the system and keep the irrelevant ones unchanged. Frame problem is all about those unchanged aspects. Obviously, we want to prevent verbosely specify in the specification of an action that it will not change any irrelevant attributes of the system, otherwise the logic would become too verbose to use in practice. Separation logic solves the problem by introducing a separation conjunction operator and together with a frame rule.

## Aug 16, 2016

I will be doing a short-term individual project on separation logic and matching logic with Grigore Rosu. The research currently being undertaken in FSL has two tracks. One is the K framework and the other is runtime verification. The future plan is to merge those two tracks, to which I will contribute.