The Principle of Structural Induction: 1 Datatypes As Term Sets
The Principle of Structural Induction: 1 Datatypes As Term Sets
Dilian Gurov
KTH Royal Institute of Technology, Stockholm
Here, the datatype IntList is defined in terms of the datatype Int, the def-
inition of which is omitted, and recursively of itself. The function symbols
empty and cons are the so-called constructors of the datatype, of arity 0
and 2, respectively.
Intuitively, the definition says that an integer list is either empty, or
else is a construction, consisting of an integer and a integer list; the first
argument defines the head of the list, while the second argument defines
1
its tail. The rules are only allowed to be applied a finite number of times,
hence all lists are finite. Three examples for integer lists are:
empty cons(11, empty) cons(−7, cons(11, empty))
The concrete syntax for such lists varies from one programming language to
another. For instance, in Prolog these three lists would be written as:
[] [11|[]] [−7|[11|[]]]
or, formatted for better readability, as:
[] [11] [−7, 11]
2
Definition by Structural Induction Structural induction prescribes
the definition of functions over all terms of a given datatype by reduc-
ing the definition to the same function, but over the immediate subterms
(of the same datatype) of the term. As we stressed above, we shall use
the fact that BNF grammars make the immediate subterm relation explicit.
And as the relation is well-founded for such inductive definitions, the princi-
ple guarantees that such functions are well-defined, in the sense that they
define a value for every element of the datatype!
Let’s say we want to define formally the notion of list length for all integer
lists. For this we introduce a unary function symbol length. To follow the
principle of structural induction (for datatypes defined inductively through
BNF grammars) would mean that we have to make two defining clauses that
use a variable l to range over arbitrary integer lists:
• for the case l = empty, we have to express length(l) directly, without
referring to length again, as empty has no immediate sublists;
• for the case l = cons(k, l0 ) for some integer k and integer list l0 , we are
allowed to use length(l0 ) in the definition of length(l).
In other words, we have to reduce the definition of length of a (non-empty)
integer list to the length of the tail of the list. For instance, the definition:
length(empty) = 0
length(cons(k, l0 )) = length(l0 ) + 1
follows the principle. In Prolog, the definition of list length would use a
predicate, and would look as follows:
length([], 0).
length([_ | T], N) :- length(T, NT), N is NT + 1.
Note that the definition need not be correct just because we followed the
principle (for example, we could have defined the length of the empty list
as 1 and then the definition would not capture the intended notion), but the
principle guarantees that the function is well-defined for all integer lists. So,
we can compute the length of the list cons(−7, cons(11, empty)) as follows,
by applying the defining clauses:
length(cons(−7, cons(11, empty)))
= length(cons(11, empty)) + 1
= length(empty) + 1 + 1
= 0+1+1
= 2
3
always reducing the computation to the tail of the “current” list. Again,
as the sublist relation is well-founded, the computation is guaranteed to
terminate for any integer list.
In summary, we must have one defining clause per formation rule
of the BNF grammar, and reduction is always to the immediate subterms
uniquely determined by the formation rule at hand.
4
Here is an example of mutually recursive datatypes: integer trees with
arbitrary numbers of children for each node can be represented with a list.
We can define this datatype with the following BNF grammar:
We have here two mutually recursive datatypes, TreeList and Tree, each
defined by two formation rules that use different constructors (empty and
cons for the first datatype, and leaf and tree for the second).
Now, for the datatype TreeList, if we want to define, by structural induc-
tion, a function that gives the number of leaves of integer trees, we could
introduce a unary function symbol numleaves, and write:
numleaves(empty) = 0
numleaves(cons(t, tl)) = numls(t) + numleaves(tl)
where we introduce a separate (!) function numls for the Tree datatype,
which we also define by structural induction:
numls(leaf(k)) = 1
numls(tree(tl)) = numleaves(tl)