Bill and I have worked out an approach to using a GMP algorithmic theory to do natural number computation using trustable communication.
To begin with we assume we are given an algebraic theory of integer arithmetic called GMP.
GMP contains transformers for various integer operations, such as addition, multiplication, integer square root, etc. There are meaning formulas for these transformers whose instances are all Δ0, which makes all the "axioms" of GMP these Δ0 sentences.
What does it mean for GMP to be correct? It means that all these meaning formulas are theorems of a theory of integer arithmetic. Define IA to be a theory of integer arithmetic in the same vein as Peano arithmetic. For completeness, let us assume that we have some full induction principle for integer arithmetic.
There is a trivial translation from GMP to IA, which is the identity translation, since the two theories share the same language. GMP is "correct" if this identity translation is a theory interpretation. Proving that GMP is "correct" is an extremely difficult task, and so at this point we will simply trust that it is correct.
I think in practice one might unify GMP and IA into one theory, but the separation between the two theories in the above presentation, at the very least, helps to clarify what is going on and what is trusted.
As it stands GMP and IA only do integer arithmetic, but we want to do natural number arithmetic. Natural number arithmetic is compatible with integer arithmetic, but this fact requires justification. To this end we extend IA and GMP to IA' and GMP' to support natural number arithmetic.
IA' defines a type of natural numbers as a subset of integers and defines natural number operations on this type of natural numbers. From this, one should be able to prove Peano's axioms. We also extend GMP to GMP', by again defining a type of natural numbers. We define a second set of meaning formulas for natural number addition. These meaning formulas quantify over numerals for natural numbers and relate natural number operations to the integer transformers.
The new meaning formulas provide a whole host of new "axioms" for GMP', and we need to check that they are correct. We could again trust that it is correct by assuming the identity translation to IA' is an interpretation. However, instead we can prove that it is relatively correct to GMP. That is, we can prove that if GMP is correct then GMP' is correct. To do this we simply prove that all the "new axioms" of GMP' are theorems of IA' under the assumption that the "axioms" of GMP hold. Each "new axiom" of GMP' should follow from the corresponding integer "axiom" of GMP by the theory of IA'.
From here on out, we closely follow the old trustable communication framework. To use this newly created natural number service, say from PA, we first build a translation from PA to GMP', which is easy since GMP' contains the language of PA. We "run the service" which means using the translated data to pick out an instance of a meaning formula for one of the natural number operations in GMP'. Call this instance φ. Next we send φ through the semi-trusted identity interpretation to IA'.
To complete the query we need an interpretation from IA' to PA, but this requires some preparation. We extend PA to PA' to define the integers from the natural numbers using the ambient set theory of Chiron to construct them. It is important to construct the integers in such a way that the natural numbers are a subset of them, not just isomorphic to a subset of them. We include definitions for all the integer operations.
Now we can translate from IA' to PA'. We claim that this translation is an interpretation. In order to prove this, it suffices to prove that the translations of all the axioms of IA' hold in PA', which should be the case. Finally we interpret φ to PA' and then expand away all the definitions of PA' to get a final answer in PA. This expansion of definitions in practice will do nothing since φ will actually already be in the language of PA.
I’m working on this idea of considering theory signatures as contexts of some dependent type theory and theory presentations as a category fibred over the category of contexts. A model of a theory presentation corresponds to a point of said object, that is a morphism from the terminal theory presentation (i.e. the empty context with no axioms) to said object. The fibration projects models down their underlying structure, which is a point in the corresponding theory signature, that is a morphisms from the terminal theory signature (i.e. the empty context) to said theory signature, which in turn corresponds to a term fulfilling the theory signature.
Given that two structures for signature S are arrows M N : 1 → S, it seems natural to represent homomorphisms from M to N as a 2-cell F : M ⇒ N.
If our category of theory signatures is going to actually be a 2-category then we need to consider 2-fibrations, which seem to have been worked out in Claudio Hermida’s paper entitled "Some Properties of Fib as a fibred 2-category".
More importantly I need to work out what a 2-cell in the category of contexts would be exactly. Some things are pretty clear; for the context containing x:Type and when the corresponding components of the structures, M and N, are ⊢ s:Type and ⊢ t:Type, we want the corresponding component F : M ⇒ N to be a term ⊢ Fx : s → t. If the context contains an operation f : x → x, and the corresponding components of the structures are ⊢ g : s → s and ⊢ h : t → t, then we want the corresponding component of F : M ⇒ N to be the constraint ⊢ Ff : Fx∘g =s → t h∘Fx. But to work it out for the general case of dependent type boggles my mind.
There is an interesting downward transformation going on here where types in the signature turn are realized by (function) terms in the 2-cells, and terms in the signatures are realized by (equality) propositions in the 2-cells, and propositions (in theory presentations) will come for free in the 2-cells (is this always true?). Perhaps kinds will realized by types?
This all vaguely resembles Reynold's abstraction theorem.
A : Type;
B : A -> Type;
f : forall x : A. B x
=====================
|- f =η fun (x:A) => f x
A : Type;
B : A -> Type;
C : SigT A B -> Type;
f : forall p : SigT A B. C p
============================
p : SigT A B |- f p =η match p as p0 return C p0 with existT x t => f (existT A B x t) end
f : unit
========
|- f =η tt
A : Type;
x, y : A;
C : forall y, Eq A x y -> Type;
f : forall e : Eq A x y -> C y e
================================
e : Eq A x y |- f e =η match e as e0 in Eq _ _ z return C z e0 with refl => f refl A x end
A, B : Type;
C : A + B -> Type;
f : forall p : A + B. C p
=========================
p : A + B |- f p =η match p as p0 return C p0 with left x => f (left x) | right y -> f (right y) end
C : Void -> Type;
f : forall p : Void . C p
=========================
p : Void |- f p =η match p as p0 return C p0 with end
Inductive W (A:Type) (B:A -> Type) : Type :=
node : forall (a:A) (f:B a -> W A B), W A B
A : Type;
B : A -> Type;
C : W A B -> Type;
f0 : forall p : W A B. C p;
f1 : forall (a : A) (f:B a -> W A B), (forall b:B a. C (f b)) -> C (node a f)
-----------------------------------------------------------------------------
a : A, f: B a -> W A B |- f0 (node a f) =βη f1 a f (fun b => f0 (f b))
=============================================================================
p : W A B |- f0 p =η W_rect A B C f1 p
C : nat -> Type;
f0 : forall n : nat. C n;
f1 : forall n : nat. C n -> C (S n)
-----------------------------------
n:nat |- f0 (S n) =βη f1 (f0 n))
===================================
n:nat |- f0 n =η nat_rect C (f0 O) f1 n
A container is a specification of S and P and is given by the dependent type s:S ⊢ P s a.k.a. the telescope ΣS:★. S → ★. A container specifies the (strictly positive) functor λX:★. Σs:S. (P s) → X.
An indexed container over I is a specification of S and P and is given by the dependent type ΣS:★. I → S → ★.
An indexed container specifies an indexed Functor λX:I → ★. Σs:S. Πi:I. (P i s) → X i.
Containers form a commutative ★-semi-algebra a.k.a. a ★-semiring (not to be confused with a *-semiring) with the following operations:
0 := 〈⊥, []〉
1 := 〈⊤, λs. ⊥〉
〈A, P〉 + 〈B,Q〉 := 〈A + B, [P, Q]〉
〈A, P〉 × 〈B,Q〉 := 〈A × B, λ〈a,b〉. P a + Q b〉.
R·〈A,P〉 := 〈R × A, λ〈_,a〉. P a〉
Containers (aka ⊤-indexed containers) form a ★-bi-semiring. The tensor of I indexed containers with J indexed containers is (I + J)-indexed containers. The identity of this tensor are the ⊥-indexed containers which are isomorphic to ★. The tensor of an I indexed container and a J indexed container is:
〈A,P〉 ⊗ 〈B,Q〉 := 〈A × B, λ(ij:I+J) 〈a,b〉. [λi. P i a, λj. Q j b](ij)〉.
In particular the tensor of containers with themselves are the 𝔹-indexed containers. The following are the ★-bi-semiring operations on containers
ε+〈A,P〉 := Σa:A. P a → ⊥ (≜ Σa:A. ¬(P a))
ε×〈A,P〉 := Σa:A. P a → ⊤ (≅ A)
Δ+〈A,P〉 := 〈Σa:A. P a → 𝔹, λ(i:𝔹) (〈a,f〉:Σa:A. P a → 𝔹). Σp:P a. i = f p〉
Δ×〈A,P〉 := 〈A, λ(i:𝔹). P〉
β(R)〈A,P〉 := Σa:A. P a → R
The composition product of I-indexed containers with J-indexed containers are the (I × J)-indexed containers, with containers (ie ⊤-indexed containers) as the unit for the composition product. The composition product of an I-indexed container with a J-indexed container is as follows.
〈A,P〉 ⊙ 〈B,Q〉 := 〈Σa:A. Πi:I. P i a → B, λ〈i,j〉 〈a,f〉. Σp : P i a. Q j (f i p)〉
Now containers trivially form a semi-plethory since ⊤-indexed containers are clearly isomorphic to (⊤ × ⊤)-indexed containers.
ι〈A,P〉 := 〈A,λ〈〉. P〉
(∘)〈A,P〉 := 〈A,λ〈〉.
P〈〈〉,〈〉〉〉
For convenience we define the infix composition of two containers as:
〈A,P〉 ∘ 〈B,Q〉 := 〈Σa:A. P a → B, λ〈a,f〉. Σp : P a. Q (f p)〉 (≅ (∘)(〈A,P〉 ⊙ 〈B,Q〉))
Composition has an identity element, I, which doesn't appear to be a primitive semi-plethory operation for some reason. Nor do I know how to derive it from other operations. But here it is anyways.
I := 〈⊤, λs. ⊤〉
TODO: Actually verify these operations are correct by checking that all the required laws hold (upto isomorphism).
P.S. Actually containers do form a *-semiring, which makes it a ★-*-semiring. Actually we can get a ★-bi-*-semiring with the following asteration operation on containers and its dual.
〈A,P〉* := 〈ℕ, Fin〉 ∘ 〈A,P〉 (≅ 〈List A, sum ∘ map P〉)
〈A,P〉⁎ := 〈A,P〉 ∘ 〈ℕ, Fin〉 (≜ 〈Σa:A. P a → ℕ, λ〈a,f〉. Σp:P a. Fin (f p)〉)
Recalling the problem: On one side we have a computational theory call GMP that consists entirely of Δ0 theorems. On the other side we have a purely axiomatic theory called PA that is a high-level theory of Peano Arithmetic that has been conservatively extended to support decimal notation and whatever other reasonable conveniences for the natural numbers we want. The task is to use trustable communication to use the computational ability of GMP to solve simple arithmetic problems in PA.
The idea here is to extend GMP to a stronger theory GMP0 that is strong enough to define PA. Next we make a conservative extension of GMP0 to GMP' that defines the language of PA and proves the axioms of PA. Specifically GMP' defines
nat := { a : int | 0 ≤int a}
+nat := λ (a,b) : nat × nat. ι c : nat. c = a +int b
…
Here we note that PA is a subtheory in the sense that it uses a sublanguage of GMP' and every theorem of PA is a theorem of GMP' (since GMP' proves all the axioms of PA).
The next step is to prove that GMP' is, in fact, a conservative extension of PA (thus GMP0 shouldn't be more powerful than PA). The idea here is to make an interpretation from GMP' into PA that fixes the language of PA. Unfortunately we cannot directly do this because interpretations in Chiron must be injective. So instead we note that if we make an interpretation from GMP' into a conservative extension of PA, called PA', that fixes the language of PA, then we can conclude that GMP' is a conservative extension of PA
To build this interpretation we extend PA to PA' with definitions:
int := nat×{0} ∪ {0}×nat
+int := λ ((ap,an),(bp,bn)) : int × int. ι (cp,cn) : int.
cp + an + bn = ap + bp + cn ∧ (cp = 0 ∨ cn = 0)
…
Out interpretation sends int to int, +int to +int, etc. and fixes the lanuage of PA, so it sends nat to nat, and +nat to +nat.
Unfortunately this doesn't work. GMP' has a theorem (actually a definition) that nat = { a : int | 0 <=int a}. If we look at this theorem under the above alleged interpretation, it is equivalent to nat = nat×{0}, which is clearly false. So our alleged interpretation isn't actually an interpretation because it isn't actually truth preserving.
Let us suppose for the sake of argument that we somehow prove that GMP' is a conservative extension of PA. What we then need to do is to write new services for PA that use GMP' services internally to help. We need to do something like writing new services because someone has to be responsible for proving that the operations on integers are, in fact, compatible with the operations on natural numbers. In our case it will be the responsibility of the person writing these new services to prove that the services are conservative (i.e. all the meaning formulas of the services are theorems). (More details on how to make this service is needed). With new services and new meaning formulas, there is no problem with translating the meaning formula through trustable communication since the meaning formula is no longer passed through the trustable communication framework.
In this sense we have sort of failed at our task which was to use the trustable communication framework; however I'm inclined to think that the original trustable communication framework cannot actually solve this particular communication problem since it leaves no room for the required proof that the computation of the natural numbers is compatible with computation on the integers. Therefore, I think it is inevitable that something beyond trustable communication is needed in this particular situation.
Other situations that do not have this compatibility issue are likely more amiable to using trustable communication. In particular our new natural number services could be served through a trustable communication channel.
An R-R'-biring is an R-ring A such that for every other R-ring B the set of homomorphisms from A to B has the structure of an R'-ring. This is equivalent to giving the following additional operations on A with the following laws:
This gives us enough structure to show that for any R-ring B, the homomorphisms from A to B have a R'-ring structure. For instance given R-ring homomorphisms f, g : A → B and r : R' we can define
0 : A → B is 0(a) = ε+(a)⋅1
1 : A → B is 1(a) = ε×(a)⋅1
f + g : A → B is f + g = ∇ ∘ (f⊗g) ∘ Δ+
-f : A → B is -f = f ∘ σ
fg : A → B is fg = ∇ ∘ (f⊗g) ∘ Δ×
r⋅f : A → B is r⋅f = f ∘ λ ∘ (β(r)⊗id) ∘ Δ×
A biring over R is an R-R-biring.
A plain biring is a biring over ℤ just like a plain ring is a ℤ-ring.
R itself can be made into a trivial R-R'-biring over R where
ε+(a) = ε×(a) = a
σ(a) = a
Δ+(a) = Δ×(a) = a⊗1 = 1⊗a
β(r)(a) = a
R[X] is a biring over R where
ε+(X) = 0 (polynomial evaluation at 0)
ε×(X) = 1 (polynomial evaluation at 1)
σ(X) = -X (polynomial evaluation at -X)
Δ+(X) = X⊗1 + 1⊗X (polynomial evaluation at X+Y)
Δ×(X) = X⊗X (polynomial evaluation at XY)
β(r)(X) = r (polynomial evaluation at r)
A R-R'-biring A inducts a functor from R-Rings to R'-Rings that takes an R-Ring B to the R'-Ring of homomorphisms from A to B.
Birings are composable. Given an R-R'-biring A and and R'-R''-biring B, then there exists an R-R''-bring A⊙R'B. (⊙R') is called the composition product.
First we define a R-ring A⊙R'B for an R-R'-biring A and an R'-ring B in a way similar to the definition of a tensor product: A⊙R'B can be constructed as A×B modulo the following equalities:
(a + b)⊙x = (a⊙x) + (b⊙x)
ab⊙x = (a⊙x)(b⊙x)
(r⋅1)⊙x = r⋅(1⊙y) (this y is not a typo: 0⊙x = 0⊙y = 0 and 1⊙x = 1⊙y = 1)
Think of a⊙x as the polynomial a evaluated at the point x.
If B also happens to be a R'-R''-biring, then we can make A⊙R'B into an R-R''-biring.
ε+(a⊙x) = a⊙(ε+(x)) where a⊙r = β(r)(a)
ε×(a⊙x) = a⊙(ε×(x))
σ(a⊙x) = a⊙σ(x)
Δ+(a⊙x) = a⊙Δ+(x) where a⊙(w1⊗w2) = (a⊙w1)⊗(a⊙w2)
Δ×(a⊙x) = a⊙Δ×(x)
β(r)(a⊙x) = β(β(r)(x))(a)
(⊙R) and R[X] make the category of birings over R into a monoidal category. (TODO: Why is this the Tall-Wraith monoid?). We want to check first that B is isomorphic to R[X]⊙RB. The isomorphism maps x ↔ (X⊙x) (the reverse map extends by the composition product laws). Similarly we want to check that A is isomorphic to A⊙RR[X]. Again, the isomorphism maps a ↔ (a⊙X).
an R-plethory is a monoid in this monoidal category of birings over R. This means that an R-plethory is a biring over R, call it A, that also has the following operations forming a monoid:
(∘) : A⊙RA → A
ι : R[X] → A
R[X] is an R-plethory with ι the identity function and (∘)(X⊙X) = X which is effectively entails polyonomial composition.
According to the mathoverflow answer, R[X] is an initial R-plethory which seems reasonable.
I am not entirely sure this is all correct. But I'm moderately confident.
I will (hopefully) continue to update this post with further notes.
Here I present what I think is a slightly simpler proof of the Primitive Element Theorem's so called "finite case" than the one found in A Course in Constructive Algebra. It still follows the most important ideas presented in that book. In particular it is important to use the correct constructive phrasing of the "finite case".
Theorem:Primitive Element Theorem: Finite Case Given a discrete field F and two algebraic values a and b over F of degrees m and n respectively then forall N either #|F| > N or there exists some g in F[a,b] such that F[a,b] = F[g].
Of course by #|F| > N we really mean the statement that there exists some finite set of elements of F that has cardinality greater than N.
We can assume that a ≠ 0 and b ≠ 0 since in these cases F[a, b] = F[b] and F[a, b] = F[a] respectively.
Every element x ∈ F[a] is a linear combination of a0, …, am-1 with coefficents in F. Say x = Σxiai. Consider the list [a0, …, aNm], and then consider the finite set of coefficients S = {(ai)j | i ≤ Nm, j < m}. If S has cardinality greater than N then we are done.
Suppose S has cardinality less than or equal to N. Then there are at most Nm linear combinations of the coefficients from S on [a0, …, am-1], and since the list [a0, …, aNm] has length Nm+1 then there must be two elements of this list that are the same. By dividing these two elements we conclude there is some i such that ai = 1a has multiplicative order i.
By a similar argument we can conclude that either #|F| > N or there is some j such that bj = 1b has multiplicative order j. Now we can proceed under the assumption that both ai = 1a has multiplicative order i and bj = 1b has multiplicative order j. The rest of the argument is standard.
Let f : ℤi×ℤj → F[a,b] be the function that sends (x,y) to axby. The function f is a group homomorphism into the multiplicative group of F[a,b], and therefore ℤi×ℤj must be cyclic and is generated by some z. Let g = f(z). Then there is some ka and ka such that gka = a and gkb = b, and thus F[a, b] = F[g]. That concludes the proof.
The infinite case of the Primitive Element Theorem states that there exists some N (specifically I think m*(n-1)+1 should work) such that if #|F|>N and b is separable over F then there exists some g in F[a,b] such that F[a,b] = F[g]. This can be proved as done in A Course in Constructive Algebra or if you want to avoid the construction of splitting fields you can follow the argument in David Speyer or Boyarsky's proof. The infinite case combined with the finite case allow us to prove the primitive element theorem:
Theorem:Primitive Element Theorem Given a discrete field F and two algebraic values a and b with b separable over F, then there exists some g in F[a,b] such that F[a,b] = F[g].
Inspired by how Bas and Eelis handle morphisms in their work, I've rethought how morphisms can be implemented in Mathscheme.
Consider how to build a theory of monoid morphisms. One can build this theory by first taking a pushout of the theory of monoids with itself over the empty theory (i.e. take the disjoint union of the theory of monoids with itself) this gives us two copies of the theory of monoids and each copy has a separate universe: U1 and U2. Then we extend this theory with a function f : U1 → U2 and axioms that says that f preserves each operation. This procedure works in general for algebraic theories.
Notice that this gives us a theory of monoid morphisms between some monoids, but not the theory a theory of monoid morphisms between two particular monoids, which is what we are usually interested in. However, by considering this theory of monoid morphisms as parameterized by two monoids in the sense of my last post one can build models of the theory of monoid morphisms between particular monoids.
This way of developing theories of morphisms has some nice properties. For example, we usually only care about vector space morphisms (aka linear transformations) between two vector spaces over the same field. This sharing of the same field is easy to do with the above technique. Instead of taking the pushout of the theory of vector spaces with itself over the empty theory, one would take the pushout of the theory of vector spaces with itself over the theory of fields. This forces the theory of fields to be shared between the two copies of vector spaces, ensuring that the domain and codomain of a field morphism are both vector spaces over the same field.
Some algebraic theories are parameterized by other algebraic theories. The most common example would be vector spaces, which are parameterized by a field that acts on the vector space by scalar multiplication.
In mathscheme, the theory of vector spaces contains types for both the field and the vector space, and provides (via inheritance) all the axioms of a field for the field type and then goes on to provide the vector space axioms (via various mix-ins). When this theory is reflected as a type we get a record for both the field component and the vector space component.
However, in may cases we want to reason about vector spaces whose field component is fixed to some particular value. One idea to support this sort of thing is to not reflect the theory as a type, but as a dependent function type. In this case, a function from the type of field to a type. Loosely speaking it would be
VS : {U:type; +, *: U*U -> U; -, inv : U -> U; 0 1 : U } -> type
VS F = {V:type, + : V*V -> V; - : V -> V; 0: V; *: (F.U)*V -> V}
I have left out the axioms above for simplicity, but the axioms would bring the operations of F into play. These parameterized models can probably be created for any theory morphism between theories.
Unparameterized models could then be simply a degenerate case of parameterized models over the initial empty theory. The models of the empty theory have type {} and there is exactly one value of this type {}.
To further help modularize theory development, it may be useful to take pushouts of not just theory presentations themselves, but to take pushouts of diagrams of theory presentations. The following is an elaboration of what Douglas Smith describes in his various papers on SpecWare including "Mechanizing the Development of Software". Although we are interested in diagrams of theory presentations, this development is generic for diagrams for any category with sufficient structure (ie, categories that have colimits.).
First we define what a graph is. A graph is a finite collection of nodes and a finite collection of directed edges between those nodes with a composition operation on edges (composition requires the target node of the first edge be the same as the source node of the second edge). The set of edges is closed under composition and for each node there is an identity edge from the node to itself. We require that the identity edge be an identity for the composition operation in the usual sense.
Actually, an graph is just a category, and I could have simply said that from the beginning. However, I will use the name graph in the cases where we don't really care about the semantics of the category and we only care about the shape of the category.
This definition of a graph differs notably from the definition of directed multigraph from graph theory because our definition of a graph says whether following two paths from a source node to a target node when composed together give equal edges or not. In particular our graphs tell you whether two particular squares commute or not. Usually this information is left implicit when a diagram is drawn on a board, but such information is a vital component of the definition.
Loosely speaking, a diagram is a graph where the nodes and edges are given labels from another category C in a way that is compatible with the category C. In our particular case, we are interested in C = the category of theory presentations, but this development is generic in C.
More formally speaking, a diagram is a functor d : G → C. where G is a graph that gives the shape of the diagram. The image of a node of G under d labels the node by an object from C, and the image of an edge of G under d labels the edge by an arrow from C. The law for a functor ensure that the labeling of nodes and edges is compatible with C.
Note that is definition of a diagram differs notably from the definition of a subcategory. In a diagram two different nodes could be get the same label, and two different edges could also get the same label. In otherwords, the functor d : G → C need not be injective.
All diagrams are functors (which are arrows in the category of categories) with codomain C. The arrows of a category with a constant codomain form the objects of a slice category, so that is an obvious choice for turning diagrams into a category, but we need something a bit more lax: a lax slice category.
The objects of a lax slice category are all functors that have the same codomain (in this case C). A morphism between d1 : G1 → C and d2 : G2 → C in a lax slice category is a functor f : G1 → G2 along with a natural transformation ηf from d1 to d2∘f.
What does this definition mean for our diagrams? Given two diagrams d1 and d2, a diagram morphism between these two diagrams states two things. Firstly it assigns for each node in the graph G1, a node in the graph G2, and for each edge in the graph G1, an edge in the graph G2 in a way that is compatible. This first part is the information held by the functor f : G1 → G2 from the previous paragraph. The second part of a diagram morphism is a collection of arrows from C (in our case theory morphisms), one for each node in G1. These arrows map the label for a node in G1 to the label for the corresponding node (under f) in G2. This collection of arrows is given by the natural transformation η in the previous paragraph. Lastly, this collection of arrows is subject to a compatibility constraint that for every edge e : p → q in G1, there is a commutative square that states that f(q)∘e = η(e)∘f(p). This condition is what makes the transformation η natural.
The laxness of the lax slice category is important. Without it we would end up with the stricter requirement that d1 = d2∘f or in other words that the natural transformation is always the identity transformation. This stricter requirement would mean that the collection of arrows between diagrams always be identity arrows, and the only diagram arrows that would exist would be diagram inclusions. We require this laxness to get pushouts to generate a collection of novel theories. Without laxness we could only create the union of diagrams, which is not so interesting.
I believe it is true that if the category C has colimits, then this (lax) category of diagrams will have colimits, but I don't have a reference that proves this.
Now I will show how to use pushouts of diagrams to generate large batches of novel theories. For simplicity consider the diagram B generated from the theories: Semigroup, Monoid, and LeftNearSemiring with the usual arrow from Semigroup to Monoid, and the "multiplicative" arrow from Semigroup to LeftNearSemiring and the "additive" arrow from Monoid to LeftNearSemiring. This example is small enough to be easy to picture, but large enough to be a non-commutative diagram.
We want to use diagram pushouts to create commutative versions of these theories. To do this we will take as base A, the diagram consisting of the single node Semigroup (which implies the single identity arrow on that node), and for C we will take the single node CommutativeSemigroup. The diagram arrow from A to B will map the Semigroup node in A to the Semigroup node in B via the identity morphism. The diagram arrow from A to C will make the Semigroup noda in A to the CommutativeSemigroup node in B via the theory inclusion.
When we take the pushout of this span we get a new diagram D along with diagram arrows from B to D and C to D with the universal properties that pushouts have. What will the contents of D be? The shape of the diagram D will be the the disjoint union of the shapes of B and the shapes of C modulo the shape of A. So in this case that gives us the shape of B plus another point that in this case will be identified with the semigroup node of B. This moding out leaves us with a diagram with the same shape as B that we started with.
What are the contents of D? In this case, what was the semigroup of B has been transformed into the commutative semigroup that we gave in C. The morphisms of the diagram arrow from B to D includes our inclusion from semigroup to commutativeSemigroup. And the diagram arrow from C to D consists of the identity arrow from commutativeSemigroup to commutativeSemigroup. What was a monoid in B is a commutativeMonoid in D, and the diagram arrow from B to D contains a morphism from group to commutativeGroup. There is no morphism in the diagram arrow from C to D that targets commutativeMroup. Finally what was a LeftNearRing in B becomes a commutativeLeftNearRing where both multiplication and addition are commutative. Both operations are commutative because the two squares generated by the "multiplicative" arrow and the "additive" arrow must be commutative.
Now suppose we don't want to make both operations commutative. Say we only want to make the additive operation commutative. To achieve this we need to remove the multiplicative arrow from the diagram B yielding a diagram B′. Now the pushout of B′ and C over A yields a new diagram D′ which is similar to D, but where the LeftNearRing has been replaced with a LeftNearRing theory that only has addition commutative.
To be sure, we are not deleting the "multiplicative" arrow from our theory development. We are simply deciding to not include it in the diagram B′ we use to generate the pushout. This shows that we need to take arrows very seriously. Which arrows we include in our diagrams when doing diagram pushouts can have significant consequences on the generated theories.
Now suppose we want to have our cake and eat it too. We want to build all three extensions of LeftNearRing, one making the additive operation commutative, one making the multiplicative operation commutative, and one making both operations commutative. This is possible by creating duplicate copies of LeftNearRing in our diagram. One copy of LeftNearRing has only one arrow pointing to it from Monoid corresponding to the "additive" arrow, and another copy of LeftNearRing has only one arrow pointing to it from Semigroup corresponding to the "multiplicative" arrow. Finally we put in a third copy of LeftNearRing with two arrows from the other two LeftNearRing nodes are labeled with the identity morphism.
When we pushout with this diagram, each different copy of LeftNearRing will be assigned to a different resulting theory. The first will get commutative addition only, the second will get commutative multiplication only, and the last will get both commutative addition and commutative multiplication. The two arrows labeled with identity morphisms will transform into inclusions.
Suppose we have a diagram d of theories starting from Magma, and we want to extend this diagram with commutativity, associativity, and both commutativity and associativity. To do this we would use the above pushout mechanism twice, once to create dc the diagram with commutative structures, and another time to create da the diagram with associative structures. Then to create the diagram dac of commutative and associative structures, we do the pushout of dc and da over d, using the diagram morphisms from d when we created dc and da.
The store comonad (better known as the costate comonad) can be defined as
data Store b a = Store (b -> a) b
with the following comonadic operations
extract :: Store b a -> a
extract (Store v b) = v b
fmap :: (a -> c) -> Store b a -> Store b c
fmap f (Store v b) = Store (f . v) b
duplicate :: Store b a -> Store b (Store b a)
duplicate (Store v b) = Store (Store v) b
extend :: (Store b a -> c) -> Store b a -> Store b c
extend f x = fmap f (duplicate x)
Given this we can define a lens (aka a functional reference) as follows
type Lens a b = a -> Store b a
A lens can be used as an accessor to a field of a record, though they have many uses beyond this.
data Entry = { name_ :: String
; city_ :: String
; birthdate_ :: Data
}
name :: Lens Entry String
name entry = Store (\newName -> entry {name_ = newName})) (name_ entry)
Above we create a lens to access the name field of this Entry structure. We can build similar lenses to access other the fields.
We can write getter and setter functions for an arbitrary lenses.
get :: Lens a b -> a -> b
get lens a = b
where
Store v b = lens a
set :: Lens a b -> a -> b -> a
set lens a = v
where
Store v b = lens a
However, we expect the get and set functions to obey certain laws. Here are the three laws that Koji Kagawa gives in his paper "Compositional References for Stateful Functional Programming". They are the laws that any second year computer science student would intuitively write down.
get l (set l s b) = b
set l s (get l s) = s
set l (set l s b1) b2 = set l s b2
Today I was trying to rewrite these lens laws to use our comonadic operations on the store comonad instead of get and set. So after playing around I came up with the following alternative set of laws
extract . l = id
fmap l . l = duplicate . l
Interestingly these laws no longer depend on any special operations of the store comonad. They only use the operations generic for any comonad.
Do these laws look familiar? They should. They are exactly the laws of a coalgebra for a comonad!
I was floored by this. These three, almost arbitrary, set of laws that every computer scientist intuits are exactly the set of laws implied by this abstract nonsense from category theory. I honestly wrote these laws out completely before noticing what they were. I do not recall this fact noted on any paper about lenses that I have read, but it will definitely be mentioned in my upcoming paper.
Below is the Coq code that proves that these two sets of laws are equivalent.
Set Implicit Arguments.
Section Lens.
Variable B : Type.
Definition W a := ((B -> a) * B)%type.
Definition extract (a:Type) (x : W a) : a := fst x (snd x).
Definition fmap (a b:Type) (f : a -> b) (x : W a) : W b := pair (fun z => f (fst x z)) (snd x).
Definition duplicate (a:Type) (x : W a) : W (W a) := pair (pair (fst x)) (snd x) .
Definition extend (a b:Type) (f : W a -> b) (x : W a) : W b := fmap f (duplicate x).
Variable A : Type.
Variable l : A -> W A.
Definition rd (r : A -> W A) (a:A) : B := snd (r a).
Definition wr (r : A -> W A) (a:A) (b:B) : A := fst (r a) b.
Section Iso1.
Hypothesis coalgebra1 : forall a, extract (l a) = a.
Hypothesis coalgebra2 : forall a, fmap l (l a) = duplicate (l a).
Lemma lawA : forall s t, rd l (wr l s t) = t.
Proof.
intros s t.
generalize (coalgebra2 s).
unfold rd, wr.
destruct (l s) as [f b].
unfold fmap, duplicate.
simpl.
inversion 1.
change (l (f t)) with ((fun z => l (f z)) t).
rewrite H1.
reflexivity.
Qed.
Lemma lawB : forall s, wr l s (rd l s) = s.
Proof.
firstorder.
Qed.
Lemma lawC : forall s t1 t2, wr l (wr l s t1) t2 = wr l s t2.
Proof.
intros s t.
generalize (coalgebra2 s).
unfold rd, wr.
destruct (l s) as [f b].
unfold fmap, duplicate.
simpl.
inversion 1.
change (l (f t)) with ((fun z => l (f z)) t).
rewrite H1.
reflexivity.
Qed.
End Iso1.
Section Iso2.
Hypothesis law1 : forall s t, rd l (wr l s t) = t.
Hypothesis law2 : forall s, wr l s (rd l s) = s.
Hypothesis law3 : forall s t1 t2, wr l (wr l s t1) t2 = wr l s t2.
Lemma coalgebraA : forall a, extract (l a) = a.
Proof.
firstorder.
Qed.
(* One day we will have observational type theory, but for now ... *)
Require Import FunctionalExtensionality.
Lemma coalgebraB : forall a, fmap l (l a) = duplicate (l a).
Proof.
intros a.
apply injective_projections; auto.
apply functional_extensionality; intros x.
simpl.
apply injective_projections.
apply functional_extensionality; intros y.
firstorder.
firstorder.
Qed.
End Iso2.
End Lens.
Edit (2011-01-19): I renamed the Selection comonad to the Store comonad and rearranged the order of some parameters to make it more compatible with Control.Comonad.Trans.Store. Some of the comments below use to the old name.
Free theorems were popularized by Wadler in his 1989 paper "Theorems for Free!". In that paper only polymorphism over types is considered. In 2009, Voigtländer illustrates how this can be extended to polymorphism over type constructors with class constraints in his paper "Free Theorems Involving Type Constructor Classes". He uses polymorphism over Monads for his example. In order to learn to apply this theory I will use a very simple exercise only involving Functors. Be aware that our reasoning will be "fast and loose" which is a technical term that means that our results only apply to total values.
The exercise here is to prove that the type A → B and the polymorphic type ∀κ:Functor. κA → κB are isomorphic. The intuition is that all a polymorphic function of type ∀κ:Functor. κA → κB can do is map some internally held function of type A → B. The claim is that the following pair of functions witness the isomorphism:
iso1 : (A → B) → ∀κ:Functor. κA → κB iso1x := Λκ. λl. mapκxl
iso2 : (∀κ:Functor. κA → κB) → A → B iso2y := y Id
In the above we use Id, which is the identity functor.
To prove this we need to show that both ways of composing these two functions produce the identity function. One way is easy:
iso2 (iso1x) a =
mapIdxa =
xa
So we see that iso2 (iso1x) = x as required. The other direction is harder.
iso1 (iso2y) κl =
mapκ (y Id) l
At this point we get stuck. We need to show that mapκ (y Id) l = yκl for an arbitrary functor κ and arbitrary l : κA. To continue the proof, we are going to have to use the parametricity of y : ∀κ:Functor. κA → κB.
Parametricity of y gives us a free theorem which states that given two functors F1 and F2 and a functor morphism η : ∀α. F1α → F2α we have that:
ηB ∘ yF1 = yF2 ∘ ηA
A functor morphism is better known as a natural transformation. A natural transformation η : ∀α. F1α → F2α is required to satisfy the following law: For any types C and D and any function g : C → D, we require that,
mapF2g ∘ ηC = ηD ∘ mapF1g
My proof will proceed in two stages. The first lemma I will prove is:
Lemma 1
y Id = y (Env A) idA
Here Env A is the environment functor (it is also a monad) aka the reader functor, and id is the identity function. To prove our lemma it suffices to show that y Id a = y (Env A) id a for an arbitrary a : A. We will use the free theorem to prove this with a natural transformation from (Env A) to Id defined as
η1 : ∀α. Env Aα → Id α η1 := Λα. λh. ha
However, we must first prove that this is a natural transformation; i.e. we need to prove that mapIdg ∘ η1C = η1D ∘ map(Env A)g:
mapIdg (η1Ch) =
g (ha) =
(g ∘ h) a =
η1D (map(Env A)gh)
Now we can apply our free theorem using the natural transformation &eta1 to prove the lemma.
y Id a =
y Id (η1A idA) =
η1B (y (Env A) idA) =
y (Env A) idAa
This completes Lemma 1. To finish our proof of the isomorphism we only need to prove Lemma 2:
Lemma 2
mapκ (y (Env A) idA) l = yκl
To prove this we use the free theorem with a new natural transformation from Env A to κ:
η2 : ∀α. Env Aα → κα η2 := Λα. λh. (mapκhl)
Again, we need to show that η2 is a natural transformation, meaning we need to show that mapκg ∘ η2C = η2D ∘ map(Env A)g.
Now we can apply our free theorem using the natural transformation η2 to prove Lemma 2.
mapκ (y (Env A) idA) l =
η2B (y (Env A) idA) =
yκ (η2A idA) =
yκ (mapκ idA l) =
yκl
This completes the proof of the isomorphism. Perhaps it can be simplified by combining the two step into one, but I am satisfied with this two step process.
Instantiating Meaning Formulas In my last post I noted that the Trustable Communications paper was unclear about how to instantiate meaning formulas; in particular it doesn't say how much simplification is done.
After reviewing the paper I realize that the word "instantate" doesn't occur at all in the paper. In fact, it was written before the "Biform Theories in Chiron" paper, and doesn't even mention Chiron at all. Instantiation of meaning formulas is defined in the "Biform Theories in Chiron".
The function mapping inputs to instances of meaning formulas appears to take the place of formuloids in the Trustable Communications paper. Indeed the process of instantiation "maps each E in dom(Pi) to a formula of L" just as the definition of a a formuloid requires. However the Biform Theories in Chiron paper also has a further step where the instantiated formula is "reduced" to the desired form. So the mapping from inputs to reduced instantiated formula could also be the map that takes the place of formuloids.
Indeed the reduced formulas have a more pleasant form, where evals have been removed, which are closer to the outputs of the sample formuloids given in the Trustable Communications paper. The question remains, what is reduction? This is never specified in the Biform Theories in Chiron paper. There is only one example of reduction and that example only shows one step of reduction of evaluations of quoted expressions.
Should services do reduction before interpretation? How much reduction should be done? A sufficiently smart reduction machine might go ahead and reduce the entire expression down to "true" before interpretation, and we wouldn't want that to happen. Perhaps no reduction should happen after instantiation and the evals and quotes should be translated. But that could be quite a burden on the interpreter.
I'm working out a concrete example of an informed request in Chiron (see Trustable Communication Between Mathematics Systems. I'm considering an example where the source biform theory stack has at its base Peano Arithmetic whose language has 0, S, +, and *, plus an extension to define ^ and the target biform theory is a big-int theory, which I've been calling GMP even though the current implementation doesn't use GMP. The GMP language has may operators such as +, *, ^, sqrt, sqr, div, mod, and it has one symbol for every integer.
The goal here is to use the trustable communications framework to get GMP to compute the product of two "big integers", say (3^15 + 1)*(5^12 + 1). The first thing to notice is that what we have written down isn't in the language of PA. There are various ways to deal with this. We can either say that when I write 3 I really mean to write "(S (S (S 0)))" and similarly for other numerals. I could alternatively say that when I write 12 I really mean to write "(2^(2^(2^0) + 2^0) + 2^(2^(2^0)))" where 2 really means "(S (S 0))", or if we want to stay even closer to the string notation, when I write 12 I really mean to write "(1*10^(1*10^0) + 2*10^0)" where 10 really means "(S (S ... 0))" and 9 ... 1 really means .... Or for both the iterated binary and iterated decimal notations we could make an extension of PA to add symbols for the small number of digits needed. All of these choices and more are reasonable; we will not need commit ourselves to any particular choice for this discussion, but we can assume some choice is made.
GMP has a primitive multiply transformer which appears to be perfect to use as our service to compute the product (3^15 + 1)*(5^12 + 1), but that transformer's input has to be a pair of integer symbols. If the the translation of the inputs (3^15 + 1) and (5^12 + 1) have to be integer symbols, then the translation has to do the computation to evaluate (3^15 + 1) and the translator would be doing more work than the service itself!
So instead, we give GMP a service called "normalizeT". This transformer takes an expression in the term language of GMP and evaluates it to normal form, a single integer symbol. Now the translator can be as straightforward as we expect. The source language of PA is almost a subset of GMP. The only problem is with "Successor" which we simply map to "1 +" on the GMP side.
The next step in the trustable communications method is to instantiate the meaning formula for the normalize service. The meaning formula appears to be fairly trivial. It says forall e:Expr_GMP. [[normalizeT(e)]] = [[e]], and we instantiate this at (3^15 + 1)*(5^12 + 1) (note that this is the translated input, it just happens that GMP uses identical looking symbols to the language of PA). After simplifying the instance we get 3-5-0-3-1-5-1-3-8-1-5-3-6-4-0-8 = (3^15 + 1)*(5^12 + 1) where 3-5-0-3-1-5-1-3-8-1-5-3-6-4-0-8 is a single symbol in the language of GMP denoting the number "3503151381536408". Now this meaning formula instance needs to be interpreted back into the language of Peano Arithmetic. There are a few problems here.
The RHS, "(3^15 + 1)*(5^12 + 1)" is easy because it contains symbols that were translated from the language of PA, so we convert them back. The only tricky problem is that S was translated to "1 +". So we need to either interpret "1" going back as "S 0" or we need to have a sophisticated translation that finds "1 +" and replaces it with in the interpretation "S". Alternatively we could extend the language of GMP to include a successor function, but that seems a little strange because it would mean we are tailoring the target theory to our source theory, and the point of trustable communication was separation of concerns of about these two theories. The translation/interpretation link should be the only thing customized two the two interfaces. (Perhaps it is okay to make a conservative extension of the target theory to facilitate the link?) This is also pointing to a bigger potential problem. In many cases we expect the response to our query to contain our original inputs as part of the resulting formula, usually occurring on one side of an equality. However if our translation is not injective, then it will be impossible for the interpreted response always to contain the original question. Fortunately in this example the translation is injective, but will it always be so?
The LHS, 3-5-0-3-1-5-1-3-8-1-5-3-6-4-0-8 is more difficult to interpret back because it isn't in the image of the translation. Again we have several not so great choices to use for our interpretation. We could return S (S (S ... O)) which would be unfeasible in this case. We could return iterated binary or iterated decimal notatations (ie. (2^(2^(2^(2^(2^0)) + 2^0)+ ...) + ...)) or (3*10^(1*10^1 + 5*10^0) + ...)). Again any one of these choice are reasonable (although frankly, none of these solutions are all that compelling.)
One reason these solutions are not compelling is that if we want to make another request to GMP we have to translate back again. It would be nice if we could hold onto the the GMP result so that we can reuse it in a further request. One idea would be to suspend the translation. Say we have a mk_iterated_decimal transformer for interpreting GMP integer symbols into iterated decimal notation in the language of PA. We can suspend this processing and instead return [[mk_iterated_decimal(`3-5-0-3-1-5-1-3-8-1-5-3-6-4-0-8')]] or maybe run(`mk_iterated_decimal(`3-5-0-3-1-5-1-3-8-1-5-3-6-4-0-8')') or some other variant of eval/run.
There is another issue with our meaning formula for Normalize: forall e:Expr_GMP, [[normalizeT(e)]] = [[e]]. This meaning formula terribly underspecifies what normalize does. In fact, normalize could do nothing at all (the identify function) and it would still have the same meaning formula. The point of normalize is that the syntax of the output should be a single integer symbol.
A proper specification for normalize is forall e:Expr_GMP, [[normalizeT(e)]] = [[e]] /\ is_numeral(normalizeT(e)). It is the meaning formula plus an additional specification that e as a piece of syntax is a numeral. Here we are assuming that we extend the "meta-"langauge of GMP with a new predicate is_numeral. is_numeral is not a transformer because it is taking quoted syntax to values. A transformer always takes syntax to syntax. However we could imagine a transformer version of is_numeral that takes quoted syntax to the quoted syntax of Bool. In this case the specification for normalize would be written forall e:Expr_GMP, [[normalizeT(e)]] = [[e]] /\ [[is_numeralT(normalizeT(e))]].
If we take one of these specifications as the meaning formula for normalize, we are faced with the problem of interpreting this extended meaning formula back. For the first possible specification, after instantiation, we need to translate is_numeral(`3-5-0-3-1-5-1-3-8-1-5-3-6-4-0-8') back to the (meta?)langauge of PA. But we don't really have any obvious candidates on what to map this back to. The closest thing our (meta?)langauge has an is_normal_form predicate which can say that a number is in the form "(S (S (S ... O)", but that isn't really an appropriate interpretation for is_numeral. We also need to interpret `3-5-0-3-1-5-1-3-8-1-5-3-6-4-0-8' into the (meta?) language of PA. This interpretation might not be the same as our interpretation of 3-5-0-3-1-5-1-3-8-1-5-3-6-4-0-8. One expression denotes sytnax where as the other denotes semantics.
It may be unclear how to interpret this part of the specification, but we do know what the answer should be. We want to say that the result in PA is either is_unary(result), is_iterated_binary(result) or is_iterated_decimal(result) depending on what choice we made for our interpretation of GHC integer symbols. Let's say we picked iterated decimal notation. So the result of interpreting is_numeral(`3-5-0-3-1-5-1-3-8-1-5-3-6-4-0-8') should be is_iterated_decimal(interpret(`3-5-0-3-1-5-1-3-8-1-5-3-6-4-0-8')). Notice that we are using the transformer name for interpret in the output of the interpretation! Actually this is more like the suspended version. The output is more likely to be is_iterated_decimal(`3*10^(1*10^1 + 5*10^0) + ...')).
If we use [[is_numeralT(e)]] in the clause of our meaning formula, then after some simplification of our instance of the meaning formula, this reduces to true, which is easy to interpret into PA and the problem is avoided. What is unclear to me is how much simplification is done between the instantiation and the interpretation back. This simplification step isn't explicitly mentioned in the Trustable Communication paper.
Perhaps this is all pointing to a defect in our example. Remember that 3-5-0-3-1-5-1-3-8-1-5-3-6-4-0-8 is not in the image of the translations. So we are trying to use a service whose result doesn't occur in (the image of) our language in PA. Perhaps this is just a bad way to communicate. Perhaps we should only be running queries where both the input and outputs of the service are in (the image of) our source language. If we restrict ourselves to such queries, do our problems go away? Is this too restrictive for the types of queries we wish to make?
The Chiron implementation has a function called run which is supposed to resemble the functionality of when a user passes an input to a top command into a computer algebra system and some sort of answer is spit out at the bottom. A central idea of Mathscheme is that this computation should be done through a series of transformers. But getting from user input to output via transformers is not really as clear as one might think.
factor should produce lists of (polynomial, natural number) pairs and ifactor should produce similar lists with integers.
expand should produce a polynomial as a linear combination in the power basis. 6*35 should produce 210 and it is unclear to me what (X^2 - 1)*(X^2 + 1) should produce. Maple, for example, will spit it back out to you.
The question is, how do we get from these inputs to these output via transformers? What are the names of the symbols in the input referring to? Are they transformers, or are they mathematical concepts?
Take "6*35" as an example. The * here seems like it refers to the concept of integer multiplication. However, the concept of multiplication cannot actually multiply strings of digits and return an answer. Only a transformer can do that. So perhaps * is the name of the transformer for multiplication. However, transformers must take syntax as parameters, not numbers.
There are two possible ways to deal with this. If * is the concept of multiplication then run needs to look up the concept of * in the environment then substitute it with an associated transformer and running the transformer to get 210. If * is a transformer, then the actual input is really going to be "`6' *T `35'" and run is simply finding the transformer whose name is *T and evaluating on the inputs `6' and `35' resulting in `210'.
The other examples lead to some insight. factor looks more like a transformer than a concept, though a concept could be created for it. expand definitely appears to be a transformer. There isn't really a mathematical concept associated with it. So this suggests that the user input consists of transformer names and not concepts.
The problem is since factor is a transformer (let's call it factorT from now on to make it clear that it is a transformer name, though in practice the name would really be "factor") it is expecting syntax as its input. This means the user input needs to really be "factorT(`(X^2 - 1)*(X^2 + 1)')".
Why do we quote the argument to factorT? Do we just quote the input to all transformers? No, because we can chain transformers together. The input to the factor-expand example should really be "factorT(expandT(`(X^2 - 1)*(X^2 + 1)'))" and neither "factorT(`expandT((X^2 - 1)*(X^2 + 1))')" nor "factorT(`expandT(`(X^2 - 1)*(X^2 + 1)')')". The last two don't even typecheck because "expand" isn't part of the language of syntax that factorT accepts.
So why do we push the quotes only in as far as `(X^2 - 1)*(X^2 + 1)' and not any further. Shouldn't we be doing `(X^2 - 1)' *T `(X^2 + 1)'. In fact, shouldn't we push further down and use -T and ^T as well? We really don't want to push the quotes further in because if *T, say, multiplies two polynomials out into an expanded form, then factor has to undo all that work to get the factorization. If factor is smart, and we actually pass the syntax of a product of polynomials, then it can factor each factor independently and merge the results, which is much more efficient. Similarly for "ifactorT(`6*35')", if we don't run a "`6' *T `35'" transformation, then a smart implementation of ifactorT will have a much easier time.
The key to solving this conundrum is that there are a set of trivial transformers for * that simply term constructors. In particular there is a transformer named &*T that will take `6' and `35' and return `6*35'. This is distinct from the transformer named *T that takes `6' and `35' and returns `210'. In fact, these two transformers don't even have the same type. *T operates on a language L1 which is essentially [0-9]+, while &*T operates on a language L2 which is a term algebra for integers with symbols for + and * and whatnot.
Notice that &*T and *T both have approximately the same meaning formula. The meaning formula for *T is "forall x,y : E_L1. [[x *T y]] = [[x]] * [[y]]" and the meaning formula for &*T is "forall x,y : E_L2. [[x &*T y]] = [[x]] * [[y]]". It is now clear that meaning formulas are not a complete specification of the transformers. Transformers could produce lots of different syntax that all denote the same result, and meaning formulas of this form do not distinguish between them. I'm not saying this is a bad thing. The purpose of meaning formulas may not be to fully specify the transformer behaviour; however if meaning formulas are intended as specification, then a lot more details need to be put in them than currently appear in the Chiron-Biform theories paper.
So with two different transformers in hand, both "representing" multiplication in different ways we can now start to disambiguate what is going on. There are really two different possible user inputs that correspond to "ifactor(6*35)". The user could ask for "ifactorT('6' *T '35')" in which case the computer will first multiply 6 and 35 together to get 210 before then factoring 210. Alternatively the user could ask for "ifactorT('6' &*T '35'), in which case the computer would first combine '6' and '35' into the syntax tree '6*35' and pass that tree to ifactor which would hopefully be clever enough to factor 6 and 35 independently and merge the results. Notice that the results of both functions are the same. Only their operational behaviour is different. We could even have two different ifactorT transformers. One simple version, ifactorSimpleT, that only understands the language L1, and one clever version, ifactorCleverT, that accepts a broader language L2 and can understand how to factor '6*35' without first multiplying them together.
The conclusion from all this is that run is always given an expression of transformers, although some of them might be trivial transformers. The leaves of the expressions must be quoted constants, or alternatively they could be really trivial transformers that take no parameters and simply return a quoted constant (depending on your formalization these two ideas may simply be identical). If we consider run as a transformer itself, then its input is the syntax of the language of syntax transformers. In this sense I say that run operates on meta-syntax, and that run is a meta-transformer. This is no problem in Chiron which can formalize meta syntax in its syntax. The result of run is regular syntax of some language, which depends on the transformers used. So run maps meta-syntax to syntax.
This means that the output of the result of run cannot be used as the input to run, because they are of different types. However there is always a trivial injection from syntax- to meta-syntax that replaces each operation of the language with the trivial transformer that inserts the corresponding operation into the syntax i.e. "6*35" can be lifted to "'6' &*T '35'".
I don't promise that everything I write here will be correct. All polynomials we will consider are assumed to be seperable. All types are going to have decidable equality and choice. The base field F is assumed to be splitable, which means it is decidable if a polynomial over F is irreducible or not. Since we will only consider finite extensions of F they should all be splitable as well.
Suppose we are given a polynomial f ∈ F[x] of degree n. How do we compute the Galois group of f? Assume we are also given a splitting field E = F(α1,…,αn) of f. The traditional approach is to define the Galois group to be the automorphisms of E that fix F. This is unsuitable for formalization in Coq because the carrier are functions, which leads to all sorts of issues about undecidable equality and functional extensionality that we would rather not deal with because, after all, the Galois group is going to be a nice finite set.
The obvious candidate to represent the Galois group is some subgroup of the symmetric group on {α1,…,αn}, but which subgroup do we consider? The modern mathematical answer is simply “the one that generates automorphisms of E”, but this isn't a priori a decidable predicate.
The classical answer is to first prove some form of the primitive element theorem. We need to find an element β ∈ E such that F(β) = E. We need to compute the minimum polynomial of β, μ ∈ F[x] of degree m. Once we do this things are looking much easier. This means that the powers of β, e.g. β0,…,βm-1, form a basis for E.
Let β1,…βm be all the roots of μ. We also need to compute rational functions θ1(x),…,&thetam(x) ∈ F(x) such that each root of the μ can be computed from β, e.g. βi = θi(β). (Actually we can find polynomials &thetai, but I say rational functions here because in some cases it might be easier to compute them using Lagrange’s rational function theorem.)
Since β generates E then E is also the splitting field of μ, so the Galois group of μ is going to be the same as the Galois group of f. But μ is a much nicer polynomial than f because when one root of μ is added to F then the entire polynomial factors completely. In particular, adding β to F causes μ to factor completely because we can compute each other root of μ as a rational function of β using the θi’s.
The Galois permutations of the roots of μ is also easy to compute. First choose any element of {β1,…,βm} for β to map to, say you pick βj. Then the θi’s dictate how the rest of the roots must be permuted. In particular βi = θi(β) must be mapped to θi(βj). The field automorphism associated with this choice is also easy to see since E = F(β), then an automorphism fixing F is completely defined by its behaviour on β.
So the Galois group is specified by which root β is mapped to. There were m choices available so the Galois group has size m. This is great because it is also the dimension of the field extension E as it ought to be.
Thus we are left with the question of how to compute β. One way of finding a suitable β is to look for a linear combinations of α1,…,&alphan that work. There is only a finite number of combinations that won't work, and it seems to be decidable which combinations will or won't work. This is great when F is infinite. When F is finite it is also easy to find β because it will be a generator of the cyclic multiplicative group of E. This this procedure only works on fields that are either finite or infinite. Perhaps there is more general procedure for finding β, but this restriction may not be too severe for most purposes.
One can also follow Galois’s original approach to find a minimal polynomial for β without using the field extension E. At least assuming F has characteristic 0 (perhaps it works more generally) and f is irreducible, one can find a Galois resolvant, V, which is a linear multivariable polynomial in n-variables. It is generated in such a way that V(α1,…,αn) is a suitable β, but one can compute V only using the field F. One can compute a symmetrized version of this Galois resolvant as a polynomial in n+1 variables. One can evaluate this symmetrized polynomial at (α1,…,αn) leaving you with a polynomial V* ∈ F[x]. It is in F[x] because of symmetry of the original multivariable polynomial. V* is a n! degree polynomial and every root will generate the splitting field E. Thus one can factor V* in to several m degree irreducible polynomials and each one would make a suitable minimal polynomial μ. Finally, we have manage to construct μ without working in the splitting field E. This is nice because we can now construct a splitting field as F[x]/μ and represent the Galois group by the roots of μ in F[x]/μ. Obviously I'm quite sketchy on the details in this last paragraph. A little more research is needed here.
(* Reynold's theorem states that there are no set theoretic models of System F
* System F can be intepreted in impredicative set. Thus impredicative set
* plus functional extentionality plus Streicher's axiom K has no set theoretic
* models. This is all well known.
*)
Definition T (x:Set) : Set := (x -> bool) -> bool.
Definition map (a b:Set) (f: a -> b) (m : T a) : T b := fun c => m (fun x => c (f x)).
Definition P : Set := forall k:Set, (T k -> k) -> k.
Definition rho (A:Set) (f: T A -> A) (p:P) : A := p A f.
Definition H (q:T P) : P := fun k f => f (map P k (rho k f) q).
Lemma rho_comm : forall A f x, f (map _ _ (rho A f) x) = (rho A f) (H x).
compute.
reflexivity.
Qed.
Definition P' := {p : P | forall (A B:Set) (a: A -> B) (f1: T A -> A) (f2: T B -> B), (forall x, a (f1 x) = f2 (map A B a x)) -> rho B f2 p = a (rho A f1 p) }.
Require Import FunctionalExtensionality.
Definition J (x:P') : P := let (y,_) := x in y.
Definition H' (q:T P') : P'.
intros q.
exists (H (map P' P J q)).
intros.
assert (HJ : forall x, (rho B f2 (J x)) = a (rho A f1 (J x))).
intros [x' Hx'].
simpl.
apply Hx'.
apply H0.
unfold rho at 1.
unfold H at 1.
unfold rho at 2.
unfold H at 1.
rewrite H0.
unfold map.
apply f_equal.
extensionality x.
apply f_equal.
extensionality y.
apply f_equal.
apply HJ.
Defined.
Definition rho' (A:Set) f (x:P') : A := rho A f (J x).
Definition rho0' (x:P') := rho' P' H' x.
Lemma rho0'_comm : forall x, H' (map P' P' rho0' x) = rho0' (H' x).
Proof.
reflexivity.
Qed.
Lemma ThreeA : forall A (f: T A -> A) (theta: P' -> A),
(forall x, f (map P' A theta x) = theta (H' x)) -> forall z, rho' A f z = theta (rho0' z).
Proof.
intros A f theta Hf [z Hz].
simpl in *.
unfold rho in Hz.
apply Hz.
symmetry.
apply Hf.
Qed.
Lemma ThreeB : forall A (f: T A -> A) (theta: P' -> A) x,
f (map P' A (rho' A f) x) = rho' A f (H' x).
Proof.
reflexivity.
Qed.
Lemma rho0'_idemp : forall x, rho0' (rho0' x) = rho0' x.
symmetry.
apply ThreeA.
reflexivity.
Qed.
Definition P'' := {p | rho0' p = p}.
Definition Gamma (x:P') : P''.
intros x.
exists (rho0' x).
apply rho0'_idemp.
Defined.
Definition Kappa (x:P'') : P' := let (y,_) := x in y.
Definition H'' (q:T P'') : P'' := Gamma (H' (map P'' P' Kappa q)).
Definition rho'' (A:Set) f (x:P'') : A := rho' A f (Kappa x).
Lemma KappaGamma : (fun x => Kappa(Gamma x)) = rho0'.
extensionality x.
reflexivity.
Qed.
Require Eqdep.
Lemma GammaKappa : forall x, Gamma (Kappa x) = x.
intros [x Hx].
simpl in *.
unfold Gamma.
generalize (rho0'_idemp x).
rewrite Hx.
intros e.
replace e with Hx.
reflexivity.
apply Eqdep.EqdepTheory.UIP.
Qed.
Lemma F : forall x, Kappa (H'' x) = H' (map P'' P' Kappa x).
Proof.
intros x.
unfold H''.
change (rho0' (H' (map P'' P' Kappa x)) = H' (map P'' P' Kappa x)).
rewrite <- rho0'_comm.
rewrite <- KappaGamma.
change (H' (map P'' P' (fun x0 : P'' => Kappa (Gamma (Kappa x0))) x) =
H' (map P'' P' Kappa x)).
replace (fun x0 : P'' => Kappa (Gamma (Kappa x0))) with Kappa.
reflexivity.
symmetry.
extensionality y.
rewrite GammaKappa.
reflexivity.
Qed.
Lemma Gish : forall x,Gamma (H' x) = H'' (map P' P'' Gamma x).
Proof.
intros x.
rewrite <- (GammaKappa (Gamma (H' x))).
change (Kappa (Gamma (H' x))) with (rho0' (H' x)).
rewrite <- rho0'_comm.
rewrite <- KappaGamma.
change (map P' P' (fun x0 : P' => Kappa (Gamma x0)) x)
with (map _ _ Kappa (map _ _ Gamma x)).
rewrite <- F.
rewrite GammaKappa.
reflexivity.
Qed.
Lemma D1 : forall A (f: T A -> A) (theta: P'' -> A),
(forall x,f (map P'' A (fun x =>theta x) x) = theta (H'' x)) ->
forall z, rho'' A f z = theta z.
Proof.
intros A f theta H0 z.
unfold rho''.
rewrite (ThreeA A f (fun x => theta (Gamma x))).
rewrite <- KappaGamma.
do 2 rewrite GammaKappa.
reflexivity.
intros x.
change (map P' A (fun x0 : P' => theta (Gamma x0)) x)
with (map _ _ (fun x0 : P'' => theta x0) (map _ _ Gamma x)).
rewrite H0.
apply f_equal.
symmetry.
apply Gish.
Qed.
Lemma D2 : forall A f x, f (map P'' A (rho'' A f) x) = (rho'' A f (H'' x)).
Proof.
intros A f x.
unfold rho''.
rewrite F.
reflexivity.
Qed.
Definition G'' (x:P'') : T P'' := rho'' (T P'') (map (T P'') P'' H'') x.
Lemma TGH : forall x, (G'' (H'' x)) = map _ _ H'' (map _ _ G'' x).
Proof.
symmetry.
apply D2.
Qed.
Lemma iso1 : forall x, (H'' (G'' x)) = x.
intros x.
transitivity (rho'' P'' H'' x).
symmetry.
change (H'' (G'' x)) with ((fun x => (H'' (G'' x))) x).
apply D1.
intros y.
rewrite TGH.
reflexivity.
change (rho'' P'' H'' x = (fun x => x) x).
apply D1.
intros y.
reflexivity.
Qed.
Lemma iso2 : forall x, (G'' (H'' x)) = x.
Proof.
intros x.
rewrite TGH.
change (map _ _ (fun x => H'' (G'' x)) x = x).
replace (fun x => (H'' (G'' x))) with (fun x:P'' => x).
extensionality y.
unfold map.
apply f_equal.
extensionality z.
reflexivity.
extensionality y.
symmetry.
apply iso1.
Qed.
Record ReynoldsStatement := {
A : Set;
f : ((A -> bool) -> bool) -> A;
g : A -> (A -> bool) -> bool;
isoA : forall x, f (g x) = x;
isoB : forall x, g (f x) = x
}.
Theorem Reynolds : ReynoldsStatement.
exists P'' H'' G''.
apply iso1.
apply iso2.
Defined.
I was reading Burstall and Goguen’s paper on The semantics of clear, a specification language. I was looking to understand the theory behind using colimits to combine signatures over a common base (aka the diamond inheritance problem) which we are trying to use in the MathScheme project.
I found this to be a great paper to get me to understand the basics of the subject of Algebraic Foundations of System Specifications. But I was really surprised that they don't use colimits, specifically pushouts, to do sharing with inheritance. Instead the pushouts are only used to procedure application. To do sharing they have some complicated based theory stuff to get sharing. Technically this based theory stuff ends up using colimits to do the sharing but it all seems needlessly complicated since you could have done a pushout from the very beginning. My best guess at the reason they do all this based-theory stuff is so they can capture the required arrows to form a pushout by just "naming the endpoint", which is not really possible to do in the simpler setting because there could be multiple arrows connecting the endpoints.
I believe our current plan for MathScheme is to somehow specify which arrows we want in our pushouts, and hopefully we can do this with very light syntax (at least for common cases).
I claimed that all classical propositions have at most one proof upto extensionality. My supervisor wanted to know if I could prove, or even state it in Coq. Below is my proof. I add an extensionality axiom because Coq doesn't have an extensional equality. In some sense this proof is incomplete, because there are other classical atomic formulas other than equality on natural numbers. However this does cover all of classical higher order arithmetic, which is probably enough to illustrate my point.
Require Import Eqdep_dec.
Require Import Logic.FunctionalExtensionality.
Inductive IsClassic : Prop -> Type (* could, in principle, be Prop -> Prop *) :=
| isTrue : IsClassic True
| isFalse : IsClassic False
| isNatEq : forall n m:nat, IsClassic (n = m)
| isAnd : forall A B, IsClassic A -> IsClassic B -> IsClassic (A/\B)
| isImpl : forall A B, IsClassic B -> IsClassic (A -> B)
| isForall : forall A B, (forall a:A, IsClassic (B a)) -> IsClassic (forall a:A, B a).
Lemma ClassicProofIrrelevent : forall P, IsClassic P -> forall p q : P, p = q.
Proof.
intros P.
induction 1.
intros [] []; reflexivity.
intros [].
intros p; generalize p.
rewrite p; clear p.
intros p.
apply K_dec_set.
decide equality.
symmetry.
apply K_dec_set.
decide equality.
reflexivity.
intros [p0 p1] [q0 q1].
rewrite (IHX1 p0 q0), (IHX2 p1 q1).
reflexivity.
auto using @functional_extensionality_dep.
auto using @functional_extensionality_dep.
Qed.
I was conjecturing that the applicative functor interface to the reader monad was nice to work with because the reader monad is a commutative monad. However, Saizan suggested it was more than that.
One key function that the applicative interface cannot define is miffy (from Conor's paper) where
miffy :: Monad m => m Bool -> m a -> m a -> m a
miffy mb mx my = do
b <- mb
if b then mx else my
The best that the applicative functor can define is iffy where
iffy = liftA3 (\b x y -> if b then x else y)
The problem is that iffy always runs the "effects" of the applicative functor for both branches of the "if" statement. Thus iffy (Just True) (Just ()) Nothing reduces to Nothing while miffy (Just True) (Just ()) Nothing is Just ().
The reader monad enjoys the property that miffy is equivalent to iffy, so nothing is lost here by using the applicative functor interface.
This is probably the reason that the applicative functor interface to our step function monad is so pleasant to work with.