Tese Do Alistair
Tese Do Alistair
Work Completed
Exceeded all success criteria and completed all extensions. Dromedary supports ML polymor-
phism, ADTs, patterns, records, side-effecting primitives, mutually recursive let-bindings and
type definitions, GADTs, polymorphic variants, extensible variants, semi-explicit first-class
polymorphism, type abbreviations, and structures. I formally defined Dromedary and its type
system in a constraint-based setting. I developed a sufficiently expressive constraint language,
with novel extensions on existing work. I implemented a modular and efficient constraint-based
type inference algorithm for Dromedary, which is equally permissive and more performant in
comparison to OCaml.
Special Difficulties
None.
1
This word count was computed using texcount.
2
This code line count was computed using cloc (excluding autogenerated test output).
ii
Contents
1 Introduction 1
1.1 OCaml . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Project Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Preparation 3
2.1 Type Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 The ML Type System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.3 Constraint-Based ML : PCB . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 OCaml . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Modules and Functors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Functors, Applicatives and Monads . . . . . . . . . . . . . . . . . . . . . 9
2.2.3 Generalised Algebraic Data Types . . . . . . . . . . . . . . . . . . . . . . 11
2.2.4 Polymorphic Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.5 Semi-Explicit First-Class Polymorphism . . . . . . . . . . . . . . . . . . 13
2.2.6 Polymorphic Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Requirements Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.1 Model of Software Development . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.2 Tools Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.3 License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Starting Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Implementation 18
3.1 Dromedary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 Algebraic Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.2 Annotations and Polymorphic Recursion . . . . . . . . . . . . . . . . . . 20
3.1.3 Semi-explicit First-class Polymorphism . . . . . . . . . . . . . . . . . . . 22
3.1.4 Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.5 Polymorphic Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.6 Generalised Algebraic Data Types . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Inference Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
iii
3.2.1 Repository Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.2 Constraints and Type Reconstruction . . . . . . . . . . . . . . . . . . . . 28
3.2.3 Typing and Constraint Generation . . . . . . . . . . . . . . . . . . . . . 32
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4 Evaluation 34
4.1 Project Requirements and Success Criteria . . . . . . . . . . . . . . . . . . . . . 34
4.2 Permissiveness of Dromedary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5 Conclusions 40
5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2 Lessons Learnt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Bibliography 41
A Untyped Syntax 46
B Constraints 50
C Type System 57
D Computations 65
E Proposal 68
iv
List of Figures
1.1 An overview of the constraint-based inference pipeline. . . . . . . . . . . . . . . 1
4.1 Benchmarks of various programs using 10000 trials. A subset from the corpus is
used for permissiveness testing. Error bars represent ±2σ. . . . . . . . . . . . . 37
4.2 Benchmarks comparing Dromedary and OCaml’s asymptotic behaviour in classical
exponential cases for ML inference. Shaded areas represent the 95% confidence
interval (±2σ). 10000 trials for (a), 200 trials for (b). . . . . . . . . . . . . . . . 37
v
List of Listings
2.1 ML let-based polymorphism in action – fun-bound variables are monomorphic,
whereas let-bound variables are polymorphic. . . . . . . . . . . . . . . . . . . . . 4
2.2 The type definitions for a simple language using algebraic data types in OCaml –
e x p r and b i n _ o p are variant types and b i n d i n g is a record type. . . . . . . 8
2.3 A snippet demonstrating OCaml’s module structures and signatures. . . . . . . 9
2.4 An interpreter for the simple language from Listing 2.2. The implementation of
e v a l uses mutual recursion and labelled arguments. . . . . . . . . . . . . . . . 10
2.5 The signatures for functors, appliactives, and monads. . . . . . . . . . . . . . . . 10
2.6 The type definition of a simple DSL in OCaml using ADTs and GADTs. . . . . 11
2.7 The definition of the equality GADT in OCaml – the type ( ' a , ' b ) e q
encodes a “proof” that ' a is equal to ' b. . . . . . . . . . . . . . . . . . . . . . 11
2.8 A type definition for perfect trees in OCaml, taken from [36]. . . . . . . . . . . . 12
2.9 An example of polymorphic recursion in OCaml – requiring an explicit polymor-
phic annotation for decidable type inference. . . . . . . . . . . . . . . . . . . . . 13
2.10 The type definition of dependent associative list in OCaml using GADTs. . . . . 13
2.11 A demonstration of semi-explicit first-class polymorphism in OCaml, encoding
the polymorphic type ∀α.α key → α → α in the e l e m _ m a p p e r type. . . . . . 14
2.12 Extensible error types using polymorphic variants in OCaml, taken from the
project implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1 Examples of annotations in OCaml (on the left) and Dromedary (on the right),
illustrating the differences in the introduction of bounded type variables in
expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 The type definition of the ' a e x p r GADT in Dromedary – new syntax was
introduced for existential variables and explicit constraints. . . . . . . . . . . . . 25
3.3 Desugared (left) verses p p x _ l e t syntax (right) for applicatives (and monads). 29
3.4 A snippet of the C o n s t r a i n t s library interface. . . . . . . . . . . . . . . . . . 30
3.5 The module signature for Dromedary’s implementation of the union-find data
structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.6 The module signature for first-order unification structures. . . . . . . . . . . . . 31
3.7 An example of a composable unification structure using OCaml’s functors – the
structure F i r s t _ o r d e r extends a structure S adding (uni-sorted) variables. . 31
3.8 A snippet of Dromedary’s constraint generation illustrating the usage of con-
straints, computations, and binders for clear, compositional, and maintainable
code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
vi
1 Introduction
Since the late 1950s, many popular programming languages developed type systems and type
checkers. Type checkers give the assurance of type safety: “well-typed programs cannot go
wrong” [33], that is to say, a well-typed program is guaranteed not to violate any type system
properties at runtime. One of the problems with many statically typed languages is that they
require the programmer to annotate their programs with types. Type inference algorithms
alleviate this issue by inferring the type annotations rather than requiring the programmer to
provide them.
Type inference for functional programming languages such as Standard ML, Haskell and
Objective Caml (OCaml) is based on the ML calculus defined by Milner [33], which provides
decidable type inference for let-based polymorphism. Traditionally, inference algorithms for these
languages are extensions of algorithms W or J [33], which use partial substitutions to reason
about first-order equalities between types. However, these algorithms can become extremely
complicated when extending the ML language with additional features.
The purpose of this dissertation is to investigate type inference algorithms using a constraint-
based approach, specifically in the context of OCaml, to reduce the complexity introduced by
these additional features.
1.1 OCaml
OCaml, introduced by Leroy [29], is a popular functional programming language with an
advanced type system. The core language (referred to as Core ML) extends ML with the
following features: mutually recursive let-bindings, algebraic data types, patterns, constants,
records, mutable references (and the value restriction), exceptions and type annotations.
OCaml’s major extensions on Core ML consist of first-class and recursive modules, classes and
objects, polymorphic variants, semi-explicit first-class polymorphism, generalised algebraic data
types (GADTs), the relaxed value restriction, type abbreviations, and labels.
OCaml’s inference algorithm relies on an extension of algorithm W to efficiently deal with
type generalisation, using a technique known as rank-based generalisation [41], with additional
modifications for the above extensions.
However, it is widely accepted that OCaml’s inference algorithm has become overly complex and
difficult to maintain and evolve [52]. Constraint-based type inference proposes to solve these
problems by separating type inference into three distinct phases: constraint generation, solving,
and type reconstruction using a small independent first-order constraint language, making
inference algorithms and theoretical proofs of correctness more modular.
The idea behind constraint-based type inference is elegantly simple: for some arbitrary term M ,
we generate a constraint C such that if C is true, then M is well-typed. After solving C, we
construct M:τ , an explicitly-typed representation of the term M , during type reconstruction.
1
In this dissertation, we implement a type inference algorithm using a constraint-based approach
for a subset of OCaml, dubbed Dromedary, consisting of Core ML with type annotations, poly-
morphic variants, extensible variants, semi-explicit first-class polymorphism, type abbreviations,
structures, and GADTs.
1
In the implemented extensions.
2
2 Preparation
In this chapter, we summarise the key background material for this project. In Section 2.1, we
provide a detailed account of the existing theory for constraint-based type systems. Following
that, we present a tutorial showcasing various features in OCaml that are implemented in
Dromedary. Section 2.3 details the requirements of Dromedary’s implementation and professional
practices followed throughout the project.
3
• Generalisation: The process of converting a monomorphic type τ (or monotype) into a
type scheme σ, by binding the free variables of τ that are not present in Γ with a universal
quantifier ∀α.τ ; this implicitly occurs in the ML-let rule.
• Instantiation: The process of specialising a type scheme σ into a type τ , by substituting
the universally bound type variables α with types τ . This implicitly occurs in the ML-var
and ML-const rule.
x : ∀α.τ ∈ Γ c : ∀α.τ ∈ ∆
(ML-var) (ML-const)
Γ ` x : {τ /α} τ Γ ` c : {τ /α} τ
Γ ` e1 : τ1 → τ2 Γ ` e2 : τ1 Γ, x : τ1 ` e : τ2
(ML-app) (ML-fun)
Γ ` e1 e2 : τ2 Γ ` fun x → e : τ1 → τ2
Type inference is simply the process of finding a type τ for an expression e such that Γ ` e : τ .
Milner’s essential insight for efficient type inference is that the typing rules are syntax-directed ;
that is to say, at most, one typing judgement applies to an expression. Consequently, the shape
of the derivation tree for proving Γ ` e : τ is uniquely determined by the form of e.
4
So we can, in effect, run the typing rules “backwards” and “guess” types by introducing new type
variables α; α is then subject to certain constraints induced by subsequent typing rules. Thus,
type inference for ML decomposes into constraint generation followed by constraint solving.
Key Takeaway: ML provides parametric polymorphism with efficient decidable type inference
based on syntax-directed rules. Inference may be split up into constraint generation and constraint
solving phases.
2.1.2 Constraints
The interface between constraint generation and solving is the constraint language. The syntax
of constraints and constrained type schemes [38] is given by the grammar:
C ::= true | false | τ = τ | C ∧ C | ∃α.C
| def x : σ in C | x ≤ τ | σ ≤ τ ,
σ ::= ∀α.C ⇒ τ .
Constraints naturally model a subset of first-order logic equipped with equations between types,
consisting of conjunction and existential quantification.
Type inference for ML requires generalisation and instantiation. In order to permit these
operations, we require the last three constraint constructs and constrained type schemes. The
constraint def x : σ in C associates the (constrained) type scheme σ with x in the constraint C
(where x may appear as a free variable). The constraint x ≤ τ (and σ ≤ τ ) is an instantiation
constraint, read as: τ is an instance of x which holds if type τ is an instance of the scheme
σ associated with x. Constrained type schemes were introduced to avoid the interleaving of
constraint generation and solving since determining the type scheme ∀α.τ requires constraint
solving. We often write τ or ∀α.τ as syntactic sugar for ∀ · .true ⇒ τ and ∀α.true ⇒ τ ,
respectively.
Constraints are a formal logic: a syntax with a semantic interpretation. Semantically, constraints
are interpreted in the Herbrand universe, that is, the set of ground types:
t ::= t F .
A ground assignment ϕ is a partial function from type variables to ground types. Similarly,
an environment ρ is a partial function from term variables x to sets of ground types. The
interpretation of constraints is defined inductively in Figure 2.2, by the judgement ϕ; ρ C,
read as: in the environment ρ, the ground assignment ϕ satisfies C.
ϕ(τ1 ) = ϕ(τ2 ) ϕ; ρ C1 ϕ; ρ C2
ϕ; ρ true ϕ; ρ τ1 = τ2 ϕ; ρ C1 ∧ C2
We interpret the constrained type scheme ∀α.C ⇒ τ under the assignment ϕ and environment
ρ as the set of ground types ϕ0 (τ ) if the assignments ϕ and ϕ0 are equal modulo α, denoted
ϕ =\α ϕ0 , and ϕ0 satisfies C:
(ϕ; ρ)(∀α.C ⇒ τ ) = ϕ0 (τ ) : ϕ =\α ϕ0 ∧ (ϕ0 ; ρ C) ,
5
where assignments ϕ and ϕ0 are said to be equal modulo α, if
Entailment C1 C2 ∀ϕ, ρ. ϕ; ρ C1 =⇒ ϕ; ρ C2 ,
Equivalence C1 ' C2 ∀ϕ, ρ. ϕ; ρ C1 ⇐⇒ ϕ; ρ C2 .
As suggested in Section 2.1.1, we can now reduce the problem of type inference in ML to
constraint solving by defining a mapping Je : τ K of candidate typings to constraints, given in
Figure 2.3.
Jx : τ K = x ≤ τ
Jc : τ K = ∆(c) ≤ τ
Jλx.e : τ K = ∃α1 α2 . τ = α1 → α2 ∧ def x : α1 in Je : α2 K if α1 , α2 # τ
Je1 e2 : τ K = ∃α. Je1 : α → τ K ∧ Je2 : αK if α # τ
Jlet x = e1 in e2 : τ K = (∃α.C) ∧ def x : ∀α.C ⇒ α in Je2 : τ K where C = [[e1 : α]] if α # τ
Figure 2.3: The constraint generation mapping for ML – Je : τ K is the constraint that holds if
and only if e has the type τ .
A problem with the definition of constraint generation in Figure 2.3 is that the constraint
C = Je1 : αK occurs twice in Jlet x = e1 in e2 : τ K, which can lead to exponential complexity.
Fortunately, we can avoid this by extending the constraint language with the following construct:
C ::= . . . | let x : σ in C,
where (|e|) is a principal constrainted type scheme for e: it is the most general set of all ground
types that e admits.
Key Takeaway: Constraints form the interface between constraint generation and solving. ML
constraint generation has linear complexity, making constraints suitable for an efficient and
modular implementation of type inference. Extensibility of constraints demonstrates the ability
to shift complexity between the constraint solving and generation phases.
6
Cx≤τ C1 ` e 1 : τ 1 → τ 2 C2 ` e2 : τ1
(PCB-var) (PCB-app)
C`x:τ C1 ∧ C2 ` e1 e2 : τ2
C ` e : τ2
(PCB-fun)
def x : τ1 in C ` fun x → e : τ1 → τ2
C1 ` e 1 : τ 1 C2 ` e 2 : τ 2
(PCB-let)
let x : ∀fv(C1 , τ1 ).C1 ⇒ τ1 in C2 ` let x = e1 in e2 : τ2
C ` e : τ1 C`e:τ α#τ
(PCB-eq) (PCB-exists)
C ∧ τ1 = τ2 ` e : τ2 ∃α.C ` e : τ
Judgements take the form C ` e : τ , read as: under the satisfiable assumptions C, the expression
e has the type τ , where the constraint C may contain free type and free term variables. We
identify judgments modulo constraint equivalence of their assumptions, that is, C ` e : τ and
D ` e : τ are equivalent when C ' D holds. The type system is described in Figure 2.4.
The PCB-var rule states that x has the type τ under the assumption that τ is an instance of x.
Unlike ML, no typing context is consulted. The environment is implicit within the constraint;
thus, any free term (and type) variables in e also occur in C.
PCB-app is analogous to ML-app. PCB-fun requires the function body e to have the return
type τ2 , under C. In the rule’s conclusion, we wrap C in def x : τ1 in C, assuming parameter
x has type τ1 , permitting C to contain instantiation constraints of the form x ≤ τ . PCB-let is
similar to PCB-fun, however, uses a let constraint to assign x a constrained type scheme; the free
variables fv(C1 , τ1 ) in the quantifier ensure the scheme is closed. Both PCB-eq and PCB-exists
are non-syntax directed rules required for soundness.
We are able to provide a more simplified formalisation by using PCB as the basis of
Dromedary’s type system. The inclusion of structured constraints in the typing rules
benefits advanced features that rely on constraints, most notably GADTs (Section 3.1.6).
Additionally, metatheoretic properties such as soundness and completeness [38] of constraint-
based inference for PCB may be stated directly without relying on substitutions, resulting in
straightforward correctness proofs for Dromedary’s inference3 .
Key Takeaway: PCB is a purely constraint-based presentation of ML; its advantages include
a simpler and more intuitive formalisation and easier correctness proofs for constraint-based
inference, which Dromedary benefits from.
2.2 OCaml
OCaml is a general-purpose, high-level programming language combining functional, object-
oriented, and imperative paradigms – with one of the most sophisticated and powerful type
inference algorithms available – based on the ML calculus from Section 2.1.1.
In this section, we describe several OCaml design patterns utilised throughout our codebase,
as well as a selection of sophisticated type system features implemented in Dromedary.
2
A purely constraint-based type system.
3
In future work.
7
OCaml extends ML with a wide range of features, including: mutual recursion, algebraic data
types (ADTs), and patterns. Algebraic data types are defined using a combination of records
and variants (so-called product and sum types). For example, Listing 2.2 defines the ADTs
b i n _ o p, e x p r, and b i n d i n g.
type expr =
| Int of int
(** Integer constant [1, -3, ...] *)
| Var of string
(** Variables [x, eval, ...] *)
| Let of { bindings: binding list; in_: expr }
(** Let bindings [let x1 = e1 and ... xn = en in e] *)
| Bin_op of { left: expr; op: bin_op; right: expr }
(** Infix binary operators [e1 + e2, e1 - e2] *)
and binding =
{ var: string
; exp: expr
}
Listing 2.2: The type definitions for a simple language using algebraic data types in OCaml –
e x p r and b i n _ o p are variant types and b i n d i n g is a record type.
8
Values are specified using their type (signature). A structure is the implementation of a signature
– implementing each type and value specified in the signature.
Components of a module are referred to through qualified identifiers (known as “dot notation”)
or using o p e n M o d u l e _ n a m e for unqualified access (Listing 2.3).
Modules may be parameterised by other modules using functors 4 , which are (informally)
functions from modules to modules. In Dromedary, functors are foundational for implementing
the modular constraints library.
open Core
let add t x n =
List.Assoc.add t x n ~equal:String.equal
exception Not_found
let find_exn t x =
try List.Assoc.find_exn t x ~equal:String.equal
with _ -> raise Not_found
end
Listing 2.3: A snippet demonstrating OCaml’s module structures and signatures.
Key Takeaway: Modules encapsulate and structure large OCaml programs. Functors are
parameterised modules – used to decouple dependencies between modules, increasing modularity.
2.2.2 Functors, Applicatives and Monads
Contrary to their cryptic nomenclature, functors, applicatives, and monads are simply functional
programming design patterns analogous to object-oriented design patterns. They are based on
the concept of composing various operations (or effects) to conceal complexity.
Intuitively, a functor is a polymorphic data structure ' a t that ‘wraps’ values of type ' a,
with a function m a p which lifts a function f of type ' a - > ' b, to a function on ‘wrapped’
values ' a t - > ' b t (Listing 2.5). An applicative, or applicative functor, extends a functor
by providing: (a) a function r e t u r n that accepts any value and ‘wraps’ it; (b) an operation
b o t h, that takes two values t 1 , t 2 of types ' a t and ' b t, ‘unwraps’ both their values and
4
Not to be confused with functors (Section 2.2.2).
9
let rec eval exp ~env =
match exp with
| Int n -> n
| Var x ->
Env.find_exn env x
| Let { bindings; in_ } ->
let env = bind ~env bindings in
eval ~env in_
| Bin_op { left; op; right } ->
let n1 = eval ~env left
and n2 = eval ~env right in
eval_bin_op op n1 n2
and bind bindings ~env =
(* Iterates over [bindings], adding each to [env] using
[List.fold_right], returning the extended environment. *)
List.fold_right bindings
~init:env
~f:(fun { var; exp } env ->
Env.add env var (eval ~env exp))
Listing 2.4: An interpreter for the simple language from Listing 2.2. The implementation of
e v a l uses mutual recursion and labelled arguments.
‘re-wraps’ them into a pair, yielding a value of type ( ' a * ' b ) t – allowing independent
operations to be sequenced. Monads extend applicative functors further, adding an operation
b i n d, which permits the sequencing of dependent operations. Each structure and its operations
must satisfy various laws known as the functor, applicative, and monad laws [55, 31] – which we
omit.
Applicatives are a fundamental abstraction in our constraints library (Section 3.2.2). Additionally,
we use monads extensively in our codebase as a generic design pattern for encapsulating side-
effects, such as explicitly propagating failure using the R e s u l t monad.
Key Takeaway: Functors, applicatives and monads are functional programming design patterns
(similar to OOP design patterns) that are used to hide complexity by providing the ability to
compose operations (or effects) on ‘wrapped’ values.
module type Functor = sig
type 'a t
val map : 'a t -> f:('a -> 'b) -> 'b t
end
10
2.2.3 Generalised Algebraic Data Types
Generalised algebraic data types (GADTs), introduced by Xi et al. [56], allow one to describe
richer constraints between constructors and their types. The canonical example of GADTs is a
typed domain-specific language (DSL):
(** ADT encoding of [expr] *)
type expr =
| Int of int
| Pair of expr * expr
| Fst of expr
| Snd of expr
Listing 2.6: The type definition of a simple DSL in OCaml using ADTs and GADTs.
The formal details of the GADT definition are explained in Section 3.1.6. The important point
is that it allows us to a give a more precise type for each constructor:
• I n t n has the type i n t in the DSL, thus its type is i n t e x p r.
• The constructor P a i r produces a pair from two expressions of types ' a , ' b, thus its
type is ( ' a * ' b ) e x p r.
• The constructor F s t projects the first element from a pair ' a * ' b, thus its type is
' a e x p r.
Thus, with the GADT encoding, expressions such as F s t ( I n t 1 ) in OCaml are ill-typed,
avoiding an error-prone programming style. Other compelling applications of GADTs will be
discussed throughout this dissertation (Sections 2.2.5, 3.2.2).
Problems with Inference One of the characteristic features of the ML type system is its
ability to infer the principal (or most general ) type for any well-typed expression.
Among the primary difficulties associated with inference in the presence of GADTs is the loss of
principality. Sulzmann et al. [46] demonstrated that programs with GADTs frequently have
more than one principal type. To illustrate this, we consider the following example:
type (_, _) eq = Refl : ('a, 'a) eq
11
and x has the type ' a, then it may also have the type ' b. So one may deduce that c o e r c e
has the type ( ' a , ' b ) e q - > ' a - > ' b.
However, there are in fact three principal types for c o e r c e:
• ('a, 'b) eq -> 'c -> 'c
• ('a, 'b) eq -> 'a -> 'b
• ('a, 'b) eq -> 'b -> 'a
This poses various problems. To begin, principality is a central property for efficient type
inference since it allows us to make locally optimal decisions. Second, should a program have
more than one principal type, which should we infer? To circumvent this, we often restrict the
type system or rely on explicit annotations.
As described above, deconstructing GADTs using pattern matching introduces local typing
constraints. However, these constraints may result in differing branch types in a m a t c h
expression. Reconciling these types is difficult.
Key Takeaway: GADTs allow richer constraints between data constructors and their types.
However, inference is notoriously difficult, suffering from a loss of principality, irreconcilable
branch types and reliance on polymorphic recursion (Section 2.2.4).
12
Key Takeaway: Polymorphic recursion refers to recursive functions where each recursive
occurrence is a non-trivial instantiation of a type scheme. Notable type system features that
rely on polymorphic recursion include GADTs, region-based memory management [54], and
binding-time analysis [10].
Figure 2.5: The ML typing rules for polymorphic recursion from the Milner-Mycroft calculus
[35].
Listing 2.10: The type definition of dependent associative list in OCaml using GADTs.
Now suppose we wish to write a function m a p that applies a function f to each ' a value for
each ' a k e y. For instance, we could write:
13
let map_elem (Elem (key, val_)) ~f =
Elem (key, f key val_)
However, this is ill-typed in OCaml. As with polymorphic recursion, this is because f occurs
monomorphically in m a p _ e l e m. Since f must be instantiated with an arbitrary key type ' a,
m a p _ e l e m’s correct type in System F would be
Unfortunately, OCaml does not support this form of higher-rank polymorphism, and its inference
is undecidable. In OCaml, we use semi-explicit first-class polymorphism [18] in record types to
introduce these universally quantified types:
type elem_mapper = { f : 'a. 'a key -> 'a -> 'a }
Key Takeaway: Semi-explicit first-class polymorphism provides the ability to express higher-
rank polymorphism, as in System F, using polymorphic records. As a result, we can express
more programs that would otherwise be ill-typed in the Damas-Milner “sweet spot”.
The primary characteristic that differentiates polymorphic variants from variant types is
their ability to be utilised without an explicit type declaration 6 . The polymorphic variant
` N o t _ a _ n u m b e r, for example, has the inferred type [ > ` N o t _ a _ n u m b e r ]. The > sym-
bol at the beginning of a polymorphic variant type indicates that the type is a lower bound.
We can interpret [ > ` N o t _ a _ n u m b e r ] as a variant that at least contains the constructor
` N o t _ a _ n u m b e r. Similarly, polymorphic variants may also have an upper bound. For instance:
let is_a_number t =
match t with
| `Not_a_number -> false
| `Int _ -> true
14
type number = [ `Int of int | `Float of float | `Not_a_number ]
One of the pragmatic uses of polymorphic variants is extensible error types. For example, in
Listing 2.12, the s o l v e r _ e r r o r type extends the u n i f i e r _ e r r o r type.
While polymorphic variants appear to be a superset of ordinary variants, their inference is far
more complex – potentially resulting in cyclic types. Furthermore, due to their more expressive
typing rules, they are less likely to catch type-level bugs7 .
type unifier_error =
[ `Cyclic_type of Type.t
| `Cannot_unify of Type.t * Type.t
]
type solver_error =
[ unifier_error | `Unbound_variable of string ]
Listing 2.12: Extensible error types using polymorphic variants in OCaml, taken from the
project implementation.
Key Takeaway: Polymorphic variants enhance the flexibility and modularity of ordinary
variants by leveraging structural polymorphism (subtyping). However, their inference is far more
complex, and they incur a runtime performance cost.
15
MoSCow Priority Feature
16
accepting it as correct or rectifying a fault. This was particularly advantageous when writing
tests involving large syntax trees generated by our inference algorithm. In total, I wrote 514
tests.
I built Continuous Integration workflows that automate the execution of regression tests, coverage
checks and formatters for each commit. Coverage was tracked using Coveralls.io [7] and
Bisect [2], achieving a 75% test coverage9 .
I used LexiFi’s landmarks [49] library for profiling, allowing me to optimise Dromedary’s
constraint solver. In the evaluation, I used Core Bench [3] to micro-benchmark Dromedary and
OCaml: running the type checkers multiple times, finding mean runtime and memory usage
with accuracy bounds.
2.3.3 License
This project is intended as a proof-of-concept and foundation for implementing constraint-based
type inference for OCaml. Therefore, the code was made publicly available on GitHub under
the MIT licence [22] – permitting any person to use, copy, modify, and distribute the software.
2.5 Summary
We introduced the ML calculus and its constraint-based counterpart PCB in Section 2.1, which
will serve as the foundation of Dromedary’s type system. We discussed various advanced type
system features in OCaml that we implement in Dromedary and some design patterns that are
prevalent in our implementation. We also defended decisions on features, tools, and dependencies
used and our overall software engineering methodology.
9
Many unchecked lines included interface and type definitions, resulting in a lower percentage of checked
code.
17
3 Implementation
The implementation of Dromedary is described in two sections. The first (Section 3.1) covers
the design of Dromedary, along with the description of its type system. The second (Section
3.2) explores the practical implementation of Dromedary’s inference, with a particular emphasis
on design decisions that result in efficient and modular constraint-based inference.
3.1 Dromedary
Dromedary is a subset of the OCaml language, supporting all of Core ML: ML polymorphism,
algebraic data types, type annotations, and side-effecting primitives. Additionally, Dromedary
includes many of OCaml’s advanced type system features, namely type abbreviations,
extensible variants, abstract types, polymorphic recursion, semi-explicit first-class polymorphism,
GADTs, and polymorphic variants.
The untyped syntax of Dromedary is given using BNF in Appendix A. We have designed a type
system for Dromedary, based on PCB’s type system presented in Section 2.1.3. It is the the
first unified formalisation of (a substantial subset of) OCaml’s type system in a constraint-based
setting.
This section discusses a selection of the aforementioned language features and their formalisation
– referencing selected typing rules. The majority of features are orthogonal and will be discussed
as independent extensions to the ML type system. For the mathematically inclined reader,
Appendix C provides a complete formalisation of the type system.
where [S] denotes that the syntactic element S is optional. A structural environment Ψ consists
of a sequence of typing definitions. The (closed) type scheme assigned to K and ` in Ψ, which
one may derive from the type definition of F, are written as:
Ψ ` K : ∀α.[τ →] α F ,
Ψ ` ` : ∀α.τ → α F .
For example, the algebraic data type α perfect tree (Listing 2.8) has the following constructors:
Leaf : ∀α.α → α perfect tree ,
Node : ∀α.α × (α × α) perfect tree → α perfect tree .
We extend the constraint language with instantiation constraints for data constructors K ≤ τ
and labels ` ≤ τ , semantically defined such that the following equivalences hold:
K ≤ [τ1 →] τ2 ' ∃α. [τ1 = τ ∧] τ2 = α F if Ψ ` K : ∀α.[τ →] α F
` ≤ τ1 → τ2 ' ∃α. τ1 = τ ∧ τ2 = α F if Ψ ` ` : ∀α.τ → α F ,
18
where Ψ is implicit.
To support binding multiple variables at once for patterns, we need to introduce the notion of
constrained contexts and fragments. A fragment ∆ is a mapping between term variables and
their types: ∆ ::= · | ∆, x : τ . Fragments intuitively reflect the typing context introduced when
a value is successfully matched against a pattern.
A constrained context Γ ::= ∀α.C ⇒ ∆ specifies a mapping from term variables x to (constrained)
type schemes ∀α.C ⇒ ∆(x). These are required for the efficient (linear) generation of constraints
for let-bound pattern matching. We write ∆ for the context of the form ∀ · .true ⇒ ∆. The
constraint language is suitably extended with constrained contexts:
These multi-variadic bindings are semantically equivalent to nested def and let constraints:
The typing judgements for algebraic data types feature three judgements corresponding to
patterns, cases, and expressions. Judgements for patterns and cases are of the form: C ` p : τ
∆ and C ` p → e : τ1 ⇒ τ2 ; interpreted as: under the satisfiable assumptions C, the pattern p
has the type τ , binding variables in fragment ∆ and under the satisfiable assumptions C, the
case p → e matches values of type τ1 returning values of type τ2 , respectively.
The typing rules in Figure 3.1 may be read as follows:
• Dromedary-pat-var: If the pattern x matches a value of type τ , then it binds x with type
τ in the fragment.
(Dromedary-pat-var)
C`x:τ x:τ
C K ≤ τ1 → τ2 C ` p : τ1 ∆
(Dromedary-pat-construct)
C ` K p : τ2 ∆
C1 ` p : τ1 ∆ C2 ` e : τ 2
(Dromedary-case)
C1 ∧ def ∆ in C2 ` p → e : τ1 ⇒ τ2
Cp ` p : τ 1 ∆ C1 ` e1 : τ1 C2 ` e2 : τ2
(Dromary-exp-let)
let ∀fv(Cp , C1 , ∆).Cp ∧ C1 ⇒ ∆ in C2 ` let p = e1 in e2 : τ2
Figure 3.1: A selection of Dromedary’s typing rules related to algebraic data types.
19
• Dromedary-exp-let: This is analogous to the PCB-let rule (Section 2.1.3). The novelty is
the use of a constrained context and pattern judgement to determine the bound variables.
e ::= . . . | (e : τ ) .
C`e:τ
(Dromedary-exp-constraint)
C ` (e : τ ) : τ
However, unlike OCaml, type variables α are not implicitly bound in Dromedary; to be used in
an annotation, the type variables must be introduced. This design choice was chosen to ensure
a more uniform and principled approach to annotations:
Type variables are either bound existentially (flexibly) or universally (rigidly). If α is existentially
bound, then the expressions (fun x → x + 1 : α → α) and (fun x → x : α → α) are well-
typed, whereas if α was universally bound, only (fun x → x : α → α) is well-typed since
(fun x → x + 1 : α → α) is only well-typed for α = int.
The typing rule for the existential form is straightforward, simply binding the variables α using
an existential quantifier in the conclusion:
C`e:τ α#τ
(Dromedary-exp-exists)
∃α.C ` exists (type α) → e : τ
The universal case is more difficult. In order to check that forall (type α) → e has the type τ , we
must check that e has the type τ for all instances of α. To express this, we introduce universal
quantification into the constraints language:
C ::= . . . | ∀α.C .
∀t. ϕ, α 7→ t; ρ C
ϕ; ρ ∀α.C
20
While universal quantification is sufficient for typing the forall construct, to permit linear
complexity for type checking (and constraint generation) we extend constrained contexts (Section
3.1.1) with universally quantified variables α:
Γ ::= ∀α, β.C ⇒ ∆ ,
where let Γ in C is semantically defined by equivalence:
let ∀α, β.C1 ⇒ ∆ in C2 ' ∀α.∃β. C1 ∧ def ∀α, β.C1 ⇒ ∆ in C2 .
C1 ` e1 : τ1 C2 ` e2 : τ2
(Dromedary-exp-rec-poly)
let rec x : ∀α.C1 ⇐ τ1 in C2 ` let rec x : ∀α.τ1 = e1 in e2 : τ2
1
Notation inspired by bidirectional type checking [9].
21
3.1.3 Semi-explicit First-class Polymorphism
Recall that OCaml (Section 2.2.5) permits programmers to specify first-class polymorphism
explicitly using records with polymorphic fields, where creating a record {` = e} introduces these
polymorphic values wrapped in the record, and record field access e.` eliminates a polymorphic
value by instantiating it.
In Dromedary, we extend our formalisation of algebraic data types (Section 3.1.1), adding the
polymorphic fields required to express semi-explicit first-class polymorphic types:
n
type α F ∼
Y
= `i : ∀βi .τi ,
i=1
where the type scheme for the label ` in context Ψ is written as:
Ψ ` ` : ∀α.(∀β.τ ) → α F .
To ensure that the record field ` = e is well-typed, where ` : ∀α.(∀β.τ ) → α F, we verify that e
has the type τ for some instance of α and all instances of β – as β must be generic. Similarly,
to determine if e.` is well-typed, we check that e has the type α F for some instance of α and
that e.` has the type τ for some instance of β.
This reasoning may be expressed using the existential and universal quantification constraints
introduced in Sections 2.1.2 and 3.1.2, respectively, resulting in the label instantiation constraints
` ≤ σ → τ and ` ≤ τ , semantically defined as:
` ≤ τ1 → τ2 ' ∃α, β. τ = τ1 ∧ α F = τ2 if Ψ ` ` : ∀α.(∀β.τ ) → α F ,
` ≤ (∀β.C ⇒ τ1 ) → τ2 ' ∃α. α F = τ2 ∧ ∀β. τ1 = τ ∧ C if Ψ ` ` : ∀α.(∀β.τ ) → α F .
As a result of these constraints, the typing rules for semi-explicit first-class polymorphism are
as follows:
C ` ≤ τ1 → τ2 C ` e : τ2
(Dromedary-exp-field)
C ` e.` : τ1
C ` e : τ1
(Dromedary-exp-record)
` ≤ (∀β.C ⇒ τ1 ) → τ2 ` ` = e : τ2
3.1.4 Sharing
Sharing is the process of removing repeated types and variables; it is a critical technique for
efficient type inference in ML. It also plays a role in more sophisticated type systems such as
ambivalent types (Section 3.1.6) and MLF [44].
In practice, types are shared by representing them as a directed acyclic graphs rather than
trees. To illustrate this, the type ( ' a - > ' a ) - > ' a - > ' a is represented by the graphs
depicted in Figure 3.2. The deduplication of repeated types is key to representing exponentially
sized types using a linear graphical representation, as demonstrated in our evaluation (Section
4.3).
A formal treatment of sharing requires the concept of a shallow type ψ. In a graph-based
description of types (as illustrated below), they are the structure of the internal nodes:
ψ ::= α F ,
where type variables represent pointers. By explicitly specifying variables (pointers), types are
not duplicated. The formal details of graphical types and converting between (deep) types τ
and shallow types are given in Appendices B, C. In type systems (such as Dromedary’s), only
the sharing of variables is significant: the sharing of internal nodes is not – we explain this
further in Section 3.1.6. 22
-> ->
Figure 3.2: The tree-based (left) and graphical (right) representations of the type
( ' a - > ' a ) - > ' a - > ' a.
τ ::= . . . | ` : τ :: τ | ∂τ .
∂τ is an infinite row, whose type is τ for every label, ` : τ :: r is the row consisting of row r
except the type for label ` is τ . Type variables used within the context of a row are called row
variables, denoted ρ.
Labels within a row are annotated with presence information; a label is either a b s e n t or
p r e s e n t with type τ , encoded using the unary type former τ p r e s e n t and nullary former
a b s e n t.
Variants To encode variants, we use the unary type former Σ, where Σ r denotes the type of
a polymorphic variant with row r. OCaml and Dromedary syntactically hide the rows and row
variables in polymorphic variants, simplifying the types exposed to the programmer.
However, this requires encoding our empty, open and closed polymorphic variants into row-based
representation before type inference:
23
to fold and unfold an equi-recursive type infinitely, making comparisons between equi-recursive
types more difficult. Fortunately, by utilising sharing (Section 3.1.4), we may represent equi-
recursive types using directed cyclic graphs. In practice, this is implemented by removing the
occurs-check in unification (Section 3.2.2).
We extend Dromedary’s types τ with aliases and recursive ->
forms to encode the a s construct, since formalising a s
directly is challenging:
[< Nil | Cons of _ ] int
τ ::= . . . | τ where α = τ | µα.τ ,
We introduce subtyping constraints of the form τ ≤ ‘K [of τ ] and τ ≥ ‘K [of τ ] to reason about
the lower (<) and upper (>) bounds of polymorphic variants (Section 2.2.6) using constraints.
Constructing a variant ‘K e (the nullary case being analogous) has a fairly elementary rule:
C τ2 ≥ ‘K of τ1 C ` e : τ1
(Dromedary-exp-variant)
C ` ‘K e : τ2
The typing rules for match expressions and cases, given in Figure 3.3, are more involved, with
several edge-cases. Dromedary-variant-match-closed implements closed pattern matching, where
every variant constructor is explicitly handled in a case. For open pattern matching, we require
a default case of the form: → e in Dromedary-variant-match-open.
C ` e : τe C τe ≤ ‘Ki [of τi ]
∀1 ≤ i ≤ n. Ci ` ‘K [pi ] → ei : ‘Ki [of τi ] ⇒ τ
^n (Dromedary-variant-match-closed)
C∧ i=1 Ci ` match e with ‘Ki [pi ] → ei : τ
C ` e : τe C τe ≥ ‘Ki [of τi ]
∀1 ≤ i ≤ n. Ci ` ‘K [pi ] → ei : ‘Ki [of τi ] ⇒ τ
Cn+1 ` en+1 : τ
^n+1 (Dromedary-variant-match-open)
C∧ i=1 Ci ` match e with (‘Ki [pi ] → ei | → en+1 ) : τ
[C1 ` p : τ1 ∆] C2 ` e : τ2
(Dromedary-variant-case)
[C1 ∧ def ∆ in] C2 : ‘K [p] → e : ‘K [of τ1 ] ⇒ τ2
Figure 3.3: A selection of Dromedary’s polymorphic variant typing rules for pattern matching.
We remark that Dromedary only type checks shallow patterns for polymorphic variants, namely
patterns of the form ‘K p where p does not include a polymorphic variant. Whereas OCaml
24
supports deep patterns by using exhaustive pattern checking [14] to determine whether the
matched variant type τe is closed or open. Exhaustive checking in the presence of GADTs
reduces to proof search [15], which is outside the scope of this dissertation due to its
complexity; nonetheless, we foresee no difficulties incorporating exhaustive checking in our
constraint-based approach.
Ψ ` K : ∀α.∃β.C ⇒ [τ →] α F .
For example, one may write GADT α expr (Listing 2.6), using equality constraints, as shown in
Figure 3.4 and Listing 3.2. The novelty of GADTs lies in the constraint C; in order to use
Figure 3.4: The formal definition of the type α expr, originally defined in Listing 2.6.
a constructor K e, e must have the type τ and the type variables α, β must be instantiated
such that the constraint C is satisfied. Pattern matching now binds local type variables and
constraints: If K p matches a value of type α F, then there exists unknown types β that satisfy
C which may be bound in the fragment of p.
Ambivalent Types Dromedary’s typing discipline for GADTs is based on Garrigue’s and
Rémy’s ambivalent types [17]. Informally, an ambivalent type ζ is a set of types that are equal
under the local constraints; they are used when the type is ambiguous – namely when |ζ| > 1.
An ambivalent type is said to have leaked if the set of types are no longer equal under the local
constraints. To illustrate this, we consider:
where the equality type e q is given in Listing 2.7. The t h e n branch returns y, with type a,
whereas the e l s e branch returns a value of type i n t. The resultant type is the ambivalent
25
type ζ = {a, int}, which represents a type that is either a or i n t When exiting the scope of
m a t c h branch, ζ is leaked – since the local equality a = int is no-longer present in the context!
Ambiguities are eliminated using annotations (Section 3.1.2); for an expression (e : τ ), the
expressions e and (e : τ ) may have differing ambivalent types ζ1 , ζ2 , but τ must be included in
both – to ensure soundness.
Ambivalent types rely on sharing (Section 3.1.4) to guarantee the inference of principal types.
When instantiating a type scheme ∀α.ζ without sharing, we lose the information that all copies
of α must be structurally equal since types that are not structurally equal may be equated due
to local equalities. Sharing recovers this information as each copy of α corresponds to the same
node in the graph-based representation of ζ (Section 3.1.4).
Ambivalent Constraints We now present our novel constraint language, extended with
ambivalent types:
C ::= true | false | C ∧ C | ∀α.C | ∃ζ.C
| ζ = ζ | ψ ⊆ ζ | R =⇒ C
ψ ::= α | ζ F ,
where ζ is an ambivalent type variable and ψ is a shallow type, either consisting of a shallow type
former ζ F or a rigid variable α. R ::= true | R ∧ R | τ = τ defines rigid constraints; constraints
solely consisting of equalities between (rigid) types.
We briefly highlight the new constructs of our language. We introduce existential quantifiers
∃ζ.C for ambivalent type variables. We enforce sharing by preventing (deep) types τ from
occurring in constraints. The ζ = ζ constraint is used in lieu of τ = τ , providing a first-order
equality constraint between ambivalent types; and the subset constraint ψ ⊆ ζ, read as: the
ambivalent type ζ includes the type ψ, is used to define explicitly shared types.
In Dromedary, we restrict the local constraints of GADT types to rigid constraints, hence type
schemes for constructors are of the form:
Ψ ` K : ∀α.∃β.R ⇒ [τ →] α F ,
where variables β are considered rigid. This mimics OCaml’s requirement to annotate GADT
types with rigid variables [16]. We may introduce local rigid constraints using the new implication
constraint R =⇒ C. Semantically, implication constraints also ensure no ambivalent types are
leaked when exiting the scope of the implication.
In practice, our constraint language differs from our presentation here since we implement
ambivalent types using scoped abbreviations, which provides an efficient (linear) consistency and
leakage check. This is the approach used by OCaml (4.12.0). While we do not fully explain this,
it seems important to acknowledge the difference.
Typing Rules We begin by extending the notion of a fragment, introduced in Section 3.1.1,
to generalised fragments. A generalised fragment Θ is a triple, consisting of a context of
existential variables β, a rigid constraint R, and a fragment ∆, written as Θ ::= ∃β.∆ ⇒ R.
These generalised fragments describe all typing information gained from a pattern that includes
GADTs.
The typing rules2 (Figure 3.5) extend our presentation of algebraic data types (Section 3.1.1).
Dromedary-pat-construct checks whether the constructor K has the type τ1 → τ2 , and binds
2
The presented typing rules here differ from the ones given in Appendix C due to sharing, which is a (trivial )
technical detail we omit.
26
local existential variables β and constraints R. The sub-pattern p checked to have the type τ1 ,
binding the fragment Θ. The novelty of GADTs require β and R to be bound in the fragment
of K p, extending Θ, which we write as ∃β.Θ ⇒ R.
Dromedary-pat-tuple requires each pattern pi in the tuple (p1 , . . . , pn ) of type τ1 × · · · × τn to
have the type τi . Each pattern produces a fragment Θi . The resultant fragment of (p1 , . . . , pn ),
is the concatenation Θ1 × · · · × Θn of the individual fragments.
In Dromedary-case, the pattern p is checked against the matched type τ1 , giving us the fragment
∃β.∆ ⇒ R. The case body e is then checked to have the type τ2 , under the assumptions of R,
using an implication constraint. The local existential variables β are universally quantified since
they represent unknown local types within C2 . We also note that the constraint C1 , which the
pattern is checked under, also assumes the local constraints R – permitting local constraints to
flow between patterns in tuples.
C K ≤ ∃β.τ1 → τ2 ⇒ R C ` p : τ1 Θ
(Dromedary-pat-construct)
C ` K p : τ2 ∃β.Θ ⇒ R
∀1 ≤ i ≤ n Ci ` pi : τi Θi
^n (Dromedary-pat-tuple)
i=1 Ci ` (p1 , . . . , pn ) : τ1 × · · · × τn Θ1 × · · · × Θn
C1 ` p : τ1 ∃β.∆ ⇒ R C2 ` e : τ2
(Dromedary-case)
∀β.R =⇒ C1 ∧ def ∆ in C2 ` p → e : τ1 ⇒ τ2
Figure 3.5: The relevant typing rules for GADTs from Dromedary’s type system.
with a Value
Constraints
Generation
Union-find
Unification
Constraint
Constraint
Parsetree Typedtree
Solving
Our explanation of Dromedary’s type inference is organised according to the constraint pipeline
(Figure 1.1), with the Sections 3.2.2 and 3.2.3 structured as illustrated above.
27
3.2.1 Repository Overview
The top-level project directory consists of the source code src/, tests test/ and benchmarks
benchmark/, with additional files for the Dune build system. Table 3.1 gives an overview of the
repository structure. Within src/, Dromedary is split into a parsing library, a constraints
library (Section 3.2.2), and a typing library (Section 3.2.3). My project repository broadly
follows the structure of the OCaml compiler, aiding in interoperability with the OCaml compiler
in the future.
28
Constraints with a Value The objective of the constraint solver is to determine whether
the constraints are satisfiable or unsatisfiable (t r u e or f a l s e). Unfortunately, this approach
doesn’t work with type reconstruction (or elaboration) – the process of constructing the typedtree.
Many languages using constraint-based inference, such as Haskell [37], resort to combining
the phases of constraint solving and elaboration. However, this approach violates the SoC
principle that Dromedary adheres to. In [39], Pottier proposes an alternative implementation of
constraints that facilitates solving and elaboration in a modular fashion. To allow elaboration,
Pottier extends constraints to not only return information of satisfiability but also values.
This gives rise to the notion of an “α-constraints”, a constraint which (if satisfiable) produces a
result of type α. In Dromedary, these constraints are represented as generalised algebraic
datatype ' a C o n s t r a i n t . t:
type _ t =
| True : unit t
| Conj : 'a t * 'b t -> ('a * 'b) t
| Eq : variable * variable -> unit t
| ...
| Def : def_binding list * 'a t -> 'a t
| Let :
'a let_binding list * 'b t
-> ('a term_let_binding list * 'b) t
| Return : 'a -> 'a t
| Map : 'a t * ('a -> 'b) -> 'b t
| Decode : variable -> Decoded.Type.t t
For example, the conjunction constraint C o n j (C1 , C2 ) returns a pair of values (v1 , v2 )
composed of the values returned by C1 , C2 respectively. Following Pottier, we also extend our
constraints language with a M a p (C , f ) construct which evaluates C to some value v (if
satisfiable), and returns the value f v. We refer the reader to [39] for the complete formal
semantics of constraints with a value. This approach allows Dromedary to express constraint
generation and type reconstruction using the constraint language – in the same place!
The constraints library (Listing 3.4) is parameterised by the notion of an algebra. Informally, an
A l g e b r a specifies the term variables embedded in the constraint language and the structure
of Dromedary’s types. The constraints library provides an abstract type ' a t for constraints
that produce a value of type ' a, a number of combinators for constructing constraints, and a
solve function that solves the constraint, either returning a value of type ' a or an e r r o r.
The constraints language is equipped with r e t u r n, b o t h ( & ~ ) and m a p combinators,
forming an applicative functor (Section 2.2.2). Dromedary makes extensive use of this abstraction
with Jane Street’s p p x _ l e t [4], which provides syntactic sugar for working with applicatives
(and monads):
(exp1 &~ exp2) let%map exp1 = exp1
>>| fun (exp1, exp2) -> and exp2 = exp2 in
Texp_app (exp1, exp2) Texp_app (exp1, exp2)
Listing 3.3: Desugared (left) verses p p x _ l e t syntax (right) for applicatives (and monads).
Constraints, like any intermediate representation, may be optimised. Dromedary uses smart
constructors to perform peephole optimisations on constraints. For instance, the equivalence
∀α.∀β.C ' ∀α, β.C may be used to reduce the number of (expensive) generalisation operations
performed. Such optimisations are not possible in OCaml’s type checker.
29
module Make (Algebra : Algebra) : sig
(** Abstract type for ['a Constraint.t] *)
type 'a t
(** The type ['a t] denotes a node within a given disjoint set.
['a] is the type of the value (descriptor) of the node. *)
type 'a t
Tarjan’s union-find data structure (Listing 3.5) implements a family of disjoint sets (equivalence
classes of types), each set associated with a descriptor (the representative type); with the
following operations: f i n d t returns the descriptor of set t; u n i o n t 1 t 2 ~ f computes
the union of the sets t 1 , t 2 merging their descriptors using f.
Dromedary implements a forest-based structure, consisting of a collection of trees, each tree
representing a disjoint set:
type 'a t = 'a node ref
and 'a node =
| Root of { rank : int; desc : 'a }
| Link of 'a t
A ' a n o d e represents a node in a tree (a set): which is either the root of the graph, containing
the descriptor of the set, or an internal node with no data and a parent node, known as a link.
For quasi-linear complexity in time for f i n d and u n i o n, we implement path compression
and union by rank [47], the latter not being implemented in OCaml’s type checker.
Unification and Structures Dromedary extends first-order unification with several non-
trivial extensions: (a) under a mixed prefix [32], unification in the presence of existential and
universal quantifiers (Section 3.1.2); (b) the addition of unscoped equational context A for type
30
abbreviations; (c) scopes and scoped equational contexts for ambivalence (Section 3.1.6); (d)
rows (Section 3.1.5).
Each extension to unification is independent and thus may be implemented modularly, using
the notion of a structure, which describes the descriptor attached to equivalence classes in
unification. The interface for a structure is given in Listing 3.6, consisting of: (a) an abstract
type for structures ' a t which contains children of type ' a; (b) a function m e r g e, which
is used to equate two structures t 1 , t 2 of type ' a t, within some context c t x of type
' a c t x, returning the resultant merged structure or raising the exception C a n n o t _ m e r g e if
the structures are not compatible; (c) functorial functions (Section 2.2.2) such as m a p, i t e r,
and f o l d used to traverse the structure performing various element-wise operations.
module type Structure = sig
type 'a t
Structures may be composed and extended using functors (Section 2.2.1). To illustrate this, we
may define a structure called F i r s t _ o r d e r (Listing 3.7) which extends an arbitrary structure
S with variables.
module First_order (S : Structure) : sig
type 'a t =
| Var
| Structure of 'a S.t
include S with type 'a t := 'a t and type 'a ctx = 'a S.ctx
end
Listing 3.7: An example of a composable unification structure using OCaml’s functors – the
structure F i r s t _ o r d e r extends a structure S adding (uni-sorted) variables.
This graphical definition permits equi-recursive types (Section 3.1.5) and sharing (Section
3.1.4), which is key for efficient unification.
31
Generalisation In the context of constraint solving, generalisation is the process of simplifying
constrained type schemes ∀α.C ⇒ τ to type schemes ∀β.τ 0 , which is performed when solving let
constraints.
For approximately linear time generalisation and instantiation, we implement Rémy’s
efficient rank -based scheme. Each type variable in the constraint is annotated with an integer
level (or rank ), which is used to determine the scope of the variable (in constant time): variables
with level l are bound in the lth nested ∀-quantifier in constrained type schemes of let constraints,
with 0 being the outermost level. For example, the following depicts the levels within the
generated constraint of expression let id = fun x → x in id:
Level 1 Level 0
Figure 3.7: A visualisation of rank-based generalisation [41] for the generated constraint of
let id = fun x → x in id.
The essential observation is that we cannot generalise variables bound in the enclosing scope,
such as α0 , since they may be equated after we exit the current scope – namely in the in id ≤ α0
portion of the above constraint.
When equating two type variables during unification, we reduce their level to the lowest of their
levels (outermost scope). Thus, when generalising (exiting lth level), we only generalise variables
whose level is greater than or equal to l – variables that are not bound in an enclosing scope.
A τ computation computes a value of type τ within the context required for constraint generation.
Within computations, one may define the notion of a binder, which represents a context for
binding variables within constraints; for example, ∃α.[·] is a binding context for the variable
32
α with a ‘hole’ (represented by the binding command exists). Computations and binders both
form monads (Section 2.2.2) with b i n d (let commands) and r e t u r n operations.
The computation bind x = u; t applies the binder u of type τ , binding x to its value, and fills u’s
‘hole’ with the constraint returned by the computation t. The binder sub x = t; u computes the
computation t of type τ , binding its value to x and returns the binder u. The DSL is shallowly
embedded in OCaml using p p x _ l e t and OCaml’s let-binding operators for b i n d and s u b
(using l e t @ and l e t &, respectively). See Appendix D for the complete embedding.
Constraint Generation Dromedary’s constraint generation utilises α-constraints and com-
putations to express constraint generation and type reconstruction together, resulting in concise,
compositional, and maintainable code that naturally reflects the formal constraint mapping
Je : τ K (Figure 2.3).
For example, the following snippet generates constraints for the application exp1 exp2
(following the definition in Figure 2.3) and constructs the respective typedtree fragment
T e x p _ a p p ( e x p 1 , e x p 2 ) in the same code segment:
| Pexp_app (exp1, exp2) ->
(* bind [var] existentially *)
let@ var = exists () in
(* check [exp1] has type [var -> exp_type];
and [exp2] has type [var] *)
let%bind exp1 = lift (infer_exp exp1) (var @-> exp_type) in
let%bind exp2 = infer_exp exp2 var in
return
(let%map exp1 = exp1
and exp2 = exp2 in
Texp_app (exp1, exp2))
Listing 3.8: A snippet of Dromedary’s constraint generation illustrating the usage of constraints,
computations, and binders for clear, compositional, and maintainable code.
3.3 Summary
This chapter began by introducing Dromedary’s type system, the first unified presentation of
OCaml’s type system in a constraint-based setting, which we believe to be (a) more natural than
other presentations for certain features, such as GADTs; (b) better suited to correctness proofs
and formal verification of the type checker, an ongoing field of research [11, 6]. We discussed
various advanced type system features of Dromedary and their constraint-based formalisation,
requiring many novel extensions to the constraints language. The author wishes to emphasise
that Dromedary implements additional features not covered in this section3 , including
abstract types, side-effecting primitives, type abbreviations, extensible variants, and structures.
Having discussed the theoretical aspects of Dromedary’s type system and its features, we
explored the practical implementation of Dromedary’s type inference algorithm – focusing on
mechanisms that allow Dromedary to implement SoC. Dromedary’s constraints library
is fundamentally modular, while implementing quasi-linear constraint solving in
time4 provided type schemes have bounded size [30]. The typing library, which implements
Dromedary’s constraint-based inference, was designed to focus on clarity and correctness;
permitting the effortless description of constraints and type reconstruction using computations.
3
Due to the page limit.
4
Not formally analysed.
33
4 Evaluation
In this section, I will evaluate whether the implementation of Dromedary fulfilled the success
criteria outlined in the project proposal (Appendix E). In Section 4.1, I demonstrate that
Dromedary far exceeds the success criteria. Following this, I explore the permissiveness of
Dromedary’s type system in comparison to OCaml’s, empirically showing that Dromedary
is as permissive as OCaml. Finally, in Section 4.3, I show that Dromedary outperforms
OCaml in our benchmarks.
Design the constraint language for Dromedary I successfully designed a constraint lan-
guage capable of expressing all of Dromedary’s features (Section 3.1); often requiring
novel extensions on previous work (Section 2.1.2).
Implement a constraint-based inference algorithm for Dromedary Not only did I im-
plement a constraint-based inference algorithm for Dromedary; but one that was funda-
mentally more modular and more performant than OCaml’s inference algorithm!
Evaluate the permissiveness and efficiency of Dromedary I use the Jane Street Expect
Test and Core Bench library to evaluate the permissiveness and performance of Dromedary’s
inference algorithm, performing 427 experiments.
34
Results We completed a total of 412 tests, summarised in Table 4.1. Each test we performed
concluded that OCaml and Dromedary are equally permissive in the implemented fea-
tures. We briefly discuss our tests and results in two categories: tests from the OCaml test
suite, and tests using examples from other sources:
OCaml testsuite: Of the 631 relevant tests for semi-explicit first-class polymorphism,
GADTs and polymorphic recursion in the OCaml test suite, Dromedary was able to
implement 283 of them.
All tests that we were unable to implement were due to features not supported by
Dromedary:
• 16% (57) were due to interactions with the module system,
• 47% (159) relied on objects and classes,
• 37% (132) involved other miscellaneous features of OCaml that are not supported in
Dromedary.
Other sources: Since OCaml’s test suite lacked representative tests for Core ML features and
polymorphic variants, we relied on other sources for examples.
For Core ML features, we curated a corpus of 111 programs using examples from
Whitington’s ‘OCaml from the very beginning’ [40], Paulson’s ‘ML for the working
programmer’ [23] and the foundations of computer science lecture notes [1]. Similarly, for
polymorphic variants, we used examples from ‘Real-world OCaml’ [34] and various papers
on polymorphic variants [12, 42], resulting in 18 additional programs.
Dromedary was able to correctly type check all of these programs.
In practice, we found that translating programs between Dromedary and OCaml only requires
minor syntactic changes where the syntax differs – for example, scoped annotations (Section
3.1.2). In total, we approximately translated 4100 lines of OCaml to Dromedary.
Since Dromedary successfully passes all tests relevant to its type system features, we conclude
that Dromedary is equally permissive as OCaml. These are encouraging results, suggesting that
the constraint-based approach used in Dromedary could be integrated with OCaml without
significant backwards compatibility issues, demonstrating the practicality of our work.
Given that we performed 412 tests, we also view these results as empirical evidence for the
correctness of Dromedary’s type system and its implementation.
4.3 Benchmarks
In this section, we discuss the efficiency and asymptotic behaviour of Dromedary; substanti-
ating our claim that Dromedary implements quasi-linear constraint solving and showing that
Dromedary is more performant than OCaml.
Methodology OCaml programs are type-checked using the OCaml compiler. I ensure the
benchmarks are comparative by modifying OCaml’s inference algorithm to ensure it only type
checks relevant features – for example, disabling inference for objects/classes and modules. This
was achieved by forking the implementation of the OCaml compiler; removing many unnecessary
libraries and modules, and replacing certain functions within the implementation with stubs.
Since Dromedary infers principal types and permits equi-recursive types (Section 3.1.5), OCaml’s
-principal and -rec-types compiler flags are enabled.
35
Feature Testsuite Files Tests
OCaml Dromedary
Core ML:
whitington.ml 51 51
paulson.ml 5 5
focs.ml 22 22
infer_core.ml 33 33
Semi-explicit First-class
Polymorphism:
poly.ml 141 9
pr7636.ml 3 2
pr9603.ml 2 0
error_messages.ml 10 0
Polymorphic Recursion:
poly.ml 5 5
GADTs:
ambiguity.ml 16 13
ambivalent_apply.ml 3 3
didier.ml 7 5
dynamic_frisch.ml 24 24
gadthead.ml 2 0
name_existentials.ml 12 12
nested_equations.ml 8 2
omega07.ml 56 56
or_patterns.ml 58 0
term_conv.ml 5 5
unify_mb.ml 14 14
principality_and_gadts.ml 38 19
return_type.ml 3 0
yallop_bugs.ml 4 0
unexpected_existentials.ml 16 2
test.ml 84 56
pr*.ml 120 56
Polymorphic Variants:
docs.ml 4 4
garrigue.ml 5 5
real_world_ocaml.ml 7 7
remy.ml 2 2
Table 4.1: A summary of tests in each file for Dromedary and OCaml – consisting of 412 tests.
36
800
600
Time (µs)
400
200
insertion sort
(perfect tree)
iter
gcd
fact
arith
making change
map
lookup
is even
and is odd
length
eval
map elem
coloring
Programs
Dromedary OCaml
Figure 4.1: Benchmarks of various programs using 10000 trials. A subset from the corpus is
used for permissiveness testing. Error bars represent ±2σ.
20,000 107
15,000 106
Time (µs)
Time (µs)
105
10,000
104
5,000 103
0 102
0 0 0 0 0 1 2 3 4 5 6
50 10
0
15
0
20
0
Figure 4.2: Benchmarks comparing Dromedary and OCaml’s asymptotic behaviour in classical
exponential cases for ML inference. Shaded areas represent the 95% confidence interval (±2σ).
10000 trials for (a), 200 trials for (b).
37
For the benchmark of each feature, we selected random programs from our permissiveness tests.
We used programs of our devising to examine the asymptotic behaviour of Dromedary and
OCaml.
The benchmarks are automated using the Core_bench micro-benchmarking library [3]. Mea-
surements are split into samples, performing linear regression to predict the execution time.
The primary source of non-determinism in benchmarks are the effects of garbage collection
(GC), which we minimise by ensuring the GC is stabilised between each benchmark. We use a
bootstrapping phase consisting of 10% of the trials to achieve tight error bounds. Measurements
were collected using my personal computer with the following specification:
Processor Intel Core i7-8700 3.20GHz
• Dromedary is better optimised due to its more modular approach, specifically in unification
and generalisation; using the union-by-rank optimisation (Section 3.2.2) and more compact
and efficient data structures for generalisation.
Some of these optimisations are possible with OCaml’s existing approach, nevertheless,
implementing them would be a technically demanding task owing to the fragility and
complexity of the type checker in its present state.
Figure 4.2 compares the asymptotic behaviour of OCaml and Dromedary. Dromedary consistently
outperforms OCaml in these benchmarks, more noticeably on smaller input parameters of n.
However, one may remark that asymptotically they behave comparably. Benchmark (a) measures
the inference of the expression:
l e t i d = f u n x - > x i n |i d i d{z · · · i d}
n times
This expression yields types of exponentially increasing sizes within the typedtree representation.
However, Dromedary and OCaml both type check the expression in quasi-linear time, as seen
in Figure 4.2 (a), owing to their use of sharing (Section 3.2.2). This corroborates our claim that
Dromedary solves constraints in quasi-linear time 2 .
In benchmark (b), we experiment with exponentially sized type schemes, which results in
exponential complexity in time, using the expression:
2
Under certain conditions [30].
38
let pair x f = f x x in
let f0 x = pair x in
let f1 x = f0 (f0 x) in
..
.
let fn x = fn−1 ( fn−1 x ) i n
fun z -> fn (fun x -> x) z
Demonstrating that Dromedary and OCaml suffer from the exponential complexity of ML
inference [24] when type schemes are unbounded, which no amount of optimisations can prevent.
4.4 Summary
Dromedary exceeded all success criteria, achieving all core requirements and extensions listed
in Section 2.3. In our benchmarks, Dromedary outperformed OCaml, demonstrating the
practicality of a constraint-based approach. Additionally, our findings indicate that Dromedary
and OCaml share the same asymptotic quasi-linear time complexity for inference2 .
Comparing the permissiveness of Dromedary and OCaml, it was clear from our results that
Dromedary’s type system offered equal expressivity in the implemented features. OCaml
and Dromedary programs only differed on minor syntactic features, with all OCaml programs
successfully translated into Dromedary programs. Notably, this suggests that our type system
and constraint-based approach for inference could be backwards-compatible with the existing
OCaml type checker; however, this is not formally proved. Our experiments into permissiveness
also provided empirical evidence towards the correctness of Dromedary’s type system and its
type checker.
39
5 Conclusions
The project was a resounding success, surpassing all core success criteria and completing many
of the planned extensions.
This project set out to develop a type inference algorithm for a subset of OCaml using a
constraint-based approach, designed to address the fragility and unnecessary complexity of the
current OCaml type checker.
I introduced Dromedary, a substantial subset of OCaml, whose type system I designed (Section
3.1) based on the PCB type system. I developed an ergonomic constraints language capable of
expressing many advanced type system features in OCaml, with modular constraint solving and
elaboration (Section 3.2).
Dromedary’s implementation was designed with the separation of concerns principle in mind,
which we believe improves clarity, modularity, extensibility and maintainability over the existing
OCaml type checker – an original aim of the project.
I established, experimentally, that Dromedary is equally permissive to OCaml in the imple-
mented features (Section 4.2). Additionally, I demonstrated that Dromedary outperforms
OCaml (Section 4.3), proving the practicality of a constraint-based approach.
40
Bibliography
[1] Jeremy Yallop Anil Madhavapeddy. Foundations of Computer Science (2021-2022) Course
Notes. url: https://www.cl.cam.ac.uk/teaching/2122/FoundsCS/focs- 202122-
v1.3.pdf.
[2] Anton Bachin. The Bisect ppx code coverage tool. 2022. url: https://github.com/
aantron/bisect_ppx.
[3] Jane Street Capital. Core bench micro-benchmarking framework. 2022. url: https://
github.com/janestreet/core_bench.
[4] Jane Street Capital. ppx let preprocessor. 2022. url: https://github.com/janestreet/
ppx_let.
[5] Dai Clegg and Richard Barker. CASE method fast-track - a RAD approach. Addison-Wesley,
1994. isbn: 978-0-201-62432-8.
[6] COCTI: Certificable OCaml Type Inference. 2022. url: https://www.math.nagoya-
u.ac.jp/~garrigue/cocti/.
[7] Coveralls.io. 2022. url: https://coveralls.io/.
[8] Simon Cruanes. The QCheck testing framework. 2022. url: https://github.com/c-
cube/qcheck.
[9] Jana Dunfield and Neelakantan R. Krishnaswami. “Complete and easy bidirectional
typechecking for higher-rank polymorphism”. In: ACM SIGPLAN International Conference
on Functional Programming, ICFP’13, Boston, MA, USA - September 25 - 27, 2013. Ed.
by Greg Morrisett and Tarmo Uustalu. ACM, 2013, pp. 429–442. doi: 10.1145/2500365.
2500582. url: https://doi.org/10.1145/2500365.2500582.
[10] Dirk Dussart, Fritz Henglein, and Christian Mossin. “Polymorphic Recursion and Subtype
Qualifications: Polymorphic Binding-Time Analysis in Polynomial Time”. In: Static
Analysis, Second International Symposium, SAS’95, Glasgow, UK, September 25-27, 1995,
Proceedings. Ed. by Alan Mycroft. Vol. 983. Lecture Notes in Computer Science. Springer,
1995, pp. 118–135. doi: 10.1007/3- 540- 60360- 3\_36. url: https://doi.org/10.
1007/3-540-60360-3%5C_36.
[11] Jacques Garrigue. “A certified implementation of ML with structural polymorphism
and recursive types”. In: Math. Struct. Comput. Sci. 25.4 (2015), pp. 867–891. doi:
10.1017/S0960129513000066. url: https://doi.org/10.1017/S0960129513000066.
[12] Jacques Garrigue. “Programming with polymorphic variants”. In: ML Workshop. Vol. 13.
7. Baltimore. 1998.
[13] Jacques Garrigue. “Simple Type Inference for Structural Polymorphism”. In: The Second
Asian Workshop on Programming Languages and Systems, APLAS’01, Korea Advanced
Institute of Science and Technology, Daejeon, Korea, December 17-18, 2001, Proceedings.
2001, pp. 329–343.
[14] Jacques Garrigue. “Typing deep pattern-matching in presence of polymorphic variants”.
In: JSSST Workshop on Programming and Programming Languages. Citeseer. 2004. url:
https://caml.inria.fr/pub/papers/garrigue-deep-variants-2004.pdf.
41
[15] Jacques Garrigue and Jacques Le Normand. “GADTs and Exhaustiveness: Looking for the
Impossible”. In: Proceedings ML Family / OCaml Users and Developers workshops, ML
Family/OCaml 2015, Vancouver, Canada, 3rd & 4th September 2015. Ed. by Jeremy Yallop
and Damien Doligez. Vol. 241. EPTCS. 2015, pp. 23–35. doi: 10.4204/EPTCS.241.2.
url: https://doi.org/10.4204/EPTCS.241.2.
[16] Jacques Garrigue and JL Normand. “Adding GADTs to OCaml: the direct approach”. In:
Workshop on ML. 2011.
[17] Jacques Garrigue and Didier Rémy. “Ambivalent Types for Principal Type Inference with
GADTs”. In: Programming Languages and Systems - 11th Asian Symposium, APLAS
2013, Melbourne, VIC, Australia, December 9-11, 2013. Proceedings. Ed. by Chung-chieh
Shan. Vol. 8301. Lecture Notes in Computer Science. Springer, 2013, pp. 257–272. doi:
10.1007/978- 3- 319- 03542- 0\_19. url: https://doi.org/10.1007/978- 3- 319-
03542-0%5C_19.
[18] Jacques Garrigue and Didier Rémy. “Semi-Explicit First-Class Polymorphism for ML”.
In: Inf. Comput. 155.1-2 (1999), pp. 134–169. doi: 10.1006/inco.1999 .2830. url:
https://doi.org/10.1006/inco.1999.2830.
[19] GitHub. GitHub project board. 2022. url: https : / / docs . github . com / en / issues /
organizing- your- work- with- project- boards/managing- project- boards/about-
project-boards.
[20] Fritz Henglein. “Type Inference with Polymorphic Recursion”. In: ACM Trans. Program.
Lang. Syst. 15.2 (1993), pp. 253–289. doi: 10 . 1145 / 169701 . 169692. url: https :
//doi.org/10.1145/169701.169692.
[21] Gérard P. Huet. “A Unification Algorithm for Typed lambda-Calculus”. In: Theor. Comput.
Sci. 1.1 (1975), pp. 27–57. doi: 10.1016/0304-3975(75)90011-0. url: https://doi.
org/10.1016/0304-3975(75)90011-0.
[22] The Open Source Initiative. The MIT licence. url: https://opensource.org/licenses/
MIT.
[23] Barry L. Ives. “ML for the Working Programmer by L. C. Paulson (Cambridge University
Press, 1996)”. In: ACM SIGSOFT Softw. Eng. Notes 22.4 (1997), p. 114. doi: 10.1145/
263244.773584. url: https://doi.org/10.1145/263244.773584.
[24] A. J. Kfoury, Jerzy Tiuryn, and Pawel Urzyczyn. “ML Typability is DEXTIME-Complete”.
In: CAAP ’90, 15th Colloquium on Trees in Algebra and Programming, Copenhagen,
Denmark, May 15-18, 1990, Proceedings. Ed. by André Arnold. Vol. 431. Lecture Notes in
Computer Science. Springer, 1990, pp. 206–220. doi: 10.1007/3-540-52590-4\_50. url:
https://doi.org/10.1007/3-540-52590-4%5C_50.
[25] A. J. Kfoury, Jerzy Tiuryn, and Pawel Urzyczyn. “Type Reconstruction in the Presence of
Polymorphic Recursion”. In: ACM Trans. Program. Lang. Syst. 15.2 (1993), pp. 290–311.
doi: 10.1145/169701.169687. url: https://doi.org/10.1145/169701.169687.
[26] Philip A Laplante. What every engineer should know about software engineering. CRC
Press, 2007.
[27] Konstantin Läufer and Martin Odersky. “Polymorphic Type Inference and Abstract
Data Types”. In: ACM Trans. Program. Lang. Syst. 16.5 (1994), pp. 1411–1430. doi:
10.1145/186025.186031. url: https://doi.org/10.1145/186025.186031.
[28] Xavier Leroy. “A modular module system”. In: J. Funct. Program. 10.3 (2000), pp. 269–
303. url: http://journals.cambridge.org/action/displayAbstract?aid=54525.
42
[29] Xavier Leroy. The ZINC experiment: an economical implementation of the ML language.
Technical report 117. INRIA, 1990. url: https://xavierleroy.org/publi/ZINC.pdf.
[30] David A. McAllester. “Joint RTA-TLCA Invited Talk: A Logical Algorithm for ML Type
Inference”. In: Rewriting Techniques and Applications, 14th International Conference, RTA
2003, Valencia, Spain, June 9-11, 2003, Proceedings. Ed. by Robert Nieuwenhuis. Vol. 2706.
Lecture Notes in Computer Science. Springer, 2003, pp. 436–451. doi: 10.1007/3-540-
44881-0\_31. url: https://doi.org/10.1007/3-540-44881-0%5C_31.
[31] Conor McBride and Ross Paterson. “Applicative programming with effects”. In: J. Funct.
Program. 18.1 (2008), pp. 1–13. doi: 10.1017/S0956796807006326. url: https://doi.
org/10.1017/S0956796807006326.
[32] Dale Miller. “Unification Under a Mixed Prefix”. In: J. Symb. Comput. 14.4 (1992), pp. 321–
358. doi: 10.1016/0747-7171(92)90011-R. url: https://doi.org/10.1016/0747-
7171(92)90011-R.
[33] Robin Milner. “A Theory of Type Polymorphism in Programming”. In: J. Comput. Syst.
Sci. 17.3 (1978), pp. 348–375. doi: 10 . 1016 / 0022 - 0000(78 ) 90014 - 4. url: https :
//doi.org/10.1016/0022-0000(78)90014-4.
[34] Yaron Minsky, Anil Madhavapeddy, and Jason Hickey. Real World OCaml - Functional
Programming for the Masses. O’Reilly, 2013. isbn: 978-1-4493-2391-2. url: http://shop.
oreilly.com/product/0636920024743.do%5C#tab%5C_04%5C_2.
[35] Alan Mycroft. “Polymorphic Type Schemes and Recursive Definitions”. In: Interna-
tional Symposium on Programming, 6th Colloquium, Toulouse, France, April 17-19, 1984,
Proceedings. Ed. by Manfred Paul and Bernard Robinet. Vol. 167. Lecture Notes in Com-
puter Science. Springer, 1984, pp. 217–228. doi: 10.1007/3-540-12925-1\_41. url:
https://doi.org/10.1007/3-540-12925-1%5C_41.
[36] Chris Okasaki. Purely functional data structures. Cambridge University Press, 1999. isbn:
978-0-521-66350-2.
[37] Simon Peyton Jones. Type inference as constraint solving: how GHC’s type inference engine
actually works. Zurihac keynote talk. June 2019. url: https://www.microsoft.com/en-
us/research/publication/type- inference- as- constraint- solving- how- ghcs-
type-inference-engine-actually-works/.
[38] Benjamin C. Pierce. Advanced Topics in Types and Programming Languages. 2005.
[39] François Pottier. “Hindley-milner elaboration in applicative style: functional pearl”. In:
Proceedings of the 19th ACM SIGPLAN international conference on Functional program-
ming, Gothenburg, Sweden, September 1-3, 2014. Ed. by Johan Jeuring and Manuel
M. T. Chakravarty. ACM, 2014, pp. 203–212. doi: 10.1145/2628136.2628145. url:
https://doi.org/10.1145/2628136.2628145.
[40] Prabhakar Ragde. “OCaml from the Very Beginning, by John Whitington, Coherent Press,
2013. ISBN-10: 0957671105 (paperback), 204 pp”. In: J. Funct. Program. 23.3 (2013),
pp. 352–354. doi: 10.1017/S0956796813000087. url: https://doi.org/10.1017/
S0956796813000087.
[41] Didier Rémy. Extending ML Type System with a Sorted Equational Theory. Research
Report 1766. Rocquencourt, BP 105, 78 153 Le Chesnay Cedex, France: Institut National
de Recherche en Informatique et Automatisme, 1992. url: http://gallium.inria.fr/
~remy/ftp/eq-theory-on-types.pdf.
43
[42] Didier Rémy. “Typechecking Records and Variants in a Natural Extension of ML”. In:
Conference Record of the Sixteenth Annual ACM Symposium on Principles of Programming
Languages, Austin, Texas, USA, January 11-13, 1989. ACM Press, 1989, pp. 77–88. doi:
10.1145/75277.75284. url: https://doi.org/10.1145/75277.75284.
[43] Didier Rémy and Jerome Vouillon. “Objective ML: An Effective Object-Oriented Extension
to ML”. In: Theory Pract. Object Syst. 4.1 (1998), pp. 27–50.
[44] Didier Rémy and Boris Yakobowski. “From ML to MLF : graphic type constraints with
efficient type inference”. In: Proceeding of the 13th ACM SIGPLAN international conference
on Functional programming, ICFP 2008, Victoria, BC, Canada, September 20-28, 2008.
Ed. by James Hook and Peter Thiemann. ACM, 2008, pp. 63–74. doi: 10.1145/1411204.
1411216. url: https://doi.org/10.1145/1411204.1411216.
[45] W. W. Royce. “Managing the Development of Large Software Systems: Concepts and
Techniques”. In: Proceedings, 9th International Conference on Software Engineering,
Monterey, California, USA, March 30 - April 2, 1987. Ed. by William E. Riddle, Robert
M. Balzer, and Kouichi Kishida. ACM Press, 1987, pp. 328–339. url: http://dl.acm.
org/citation.cfm?id=41801.
[46] Martin Sulzmann et al. Type inference for GADTs via Herbrand constraint abduction.
Tech. rep. Jan. 2008. url: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.
1.1.142.4392.
[47] Robert Endre Tarjan. “Efficiency of a Good But Not Linear Set Union Algorithm”.
In: J. ACM 22.2 (1975), pp. 215–225. doi: 10 . 1145 / 321879 . 321884. url: https :
//doi.org/10.1145/321879.321884.
[48] The Dune Team. OCaml Dune build system. 2022. url: https://dune.build/.
[49] The LexiFi Team. Landmarks profiling framework. 2022. url: https://github.com/
LexiFi/landmarks.
[50] The Mirage Team. Alcotest testing framework. 2022. url: https://github.com/mirage/
alcotest.
[51] The OCaml Team. The OCaml Compiler Testsuite. url: https://github.com/ocaml/
ocaml/tree/trunk/testsuite.
[52] The OCaml Team. TODO for the OCaml type-checker implementation. 2020. url: https:
//github.com/ocaml/ocaml/blob/4.12.0/typing/TODO.md.
[53] The OPAM Team. OCaml package manager OPAM. 2022. url: https://opam.ocaml.
org/.
[54] Mads Tofte and Jean-Pierre Talpin. “Implementation of the Typed Call-by-Value lambda-
Calculus using a Stack of Regions”. In: Conference Record of POPL’94: 21st ACM
SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Portland,
Oregon, USA, January 17-21, 1994. Ed. by Hans-Juergen Boehm, Bernard Lang, and
Daniel M. Yellin. ACM Press, 1994, pp. 188–201. doi: 10.1145/174675.177855. url:
https://doi.org/10.1145/174675.177855.
[55] Philip Wadler. “Comprehending Monads”. In: Math. Struct. Comput. Sci. 2.4 (1992),
pp. 461–493. doi: 10.1017/S0960129500001560. url: https://doi.org/10.1017/
S0960129500001560.
44
[56] Hongwei Xi, Chiyan Chen, and Gang Chen. “Guarded recursive datatype constructors”. In:
Conference Record of POPL 2003: The 30th SIGPLAN-SIGACT Symposium on Principles
of Programming Languages, New Orleans, Louisisana, USA, January 15-17, 2003. Ed. by
Alex Aiken and Greg Morrisett. ACM, 2003, pp. 224–235. doi: 10.1145/604131.604150.
url: https://doi.org/10.1145/604131.604150.
45
46
A Untyped Syntax
Dromedary is a representative1 subset of OCaml defined by the following grammar:
τ ::= Type
|α Type variable
|τ →τ Function type
|τ T Applied type constructor
| τ × ... × τ Tuple type
|[ρ] Polymorphic variant type
| (τ ) Parenthesis
ρ ::= Rows
| (<| ε |>) ‘K [of τ ]
1
The source syntax differs only notationally
47
σ ::= Scheme
| α.τ
c ::= Constant
| true
| false
| () Unit
| string literal, e.g. ”Hello World”
| float literal, e.g. 3.14 or .25
| int literal, e.g. 42 or -12
e ::= Expression
|x Variable
|c Constant
| fun p → e Function
|ee Function application
| vd in e Let binding
| (e) Parenthesis
| uop e Unary operator primitive
| e bop e Binary operator primitive
| if e then e else e If
| forall (type α) → e Universal quantifier
| exists (type α) → e Existential quantifier
| (e : τ ) Annotation
| { ` = e ; ... ; ` = e } Record
| e.` Record field access
| (e, . . . , e) Tuple
| K [e] Constructor
| match e with (h | . . . | h) Match
| try e with (h | . . . | h) Try
| e; e Sequence
| for p = e (to | downto) e do e done For loop
| while e do e done While loop
| ‘K [e] Variant
uop ::= Unary operator
|- Negation
|! Dereference
| ref Reference creation
bop ::= Binary operator
|+ Integer addition
|− Integer subtraction
|× Integer multiplication
48
|/ Integer division
| := Assignment
p ::= Pattern
| Wildcard
|c Constant
|x Variable
| K [p] Constructor
| (p, . . . , p) Tuple
| (p : τ ) Annotation
| ‘K [p] Variant
| (p) Parenthesis
h ::= Case
|p→e
where:
49
B Constraints
The complete constraints language is defined by the following grammar:
C ::= Constraint
| true Truth
| false Falsehood
|C ∧C Conjunction
| ∃α.C Existential quantification
| ∀α.C Universal quantification
| ∃ζ.C Existential ambivalent quantification
|ζ=ζ Equality
|ψ⊆ζ Subset
| ζ :> τ Ambivalent coercion
| R =⇒ C Rigid implication
| def Γ in C Explicit substitution
| let Γ in C Let binding
|x≤ζ Variable Instantiation
|σ≤ζ Scheme Instantiation
| def rec Π in C Recursive def binding
| let rec Π in C Recursive let binding
where
∆ ::= · | ∆, x : ζ Fragment
Γ ::= ∀α, ζ.C ⇒ ∆ Constrained context
50
ψ ::= Shallow type
|α Type variable
|ζ F Shallow type former
s ::= Sorts
|? Type sort
| row(L) Row sort
where L is the enumerable set of labels and L ⊆ L. The sort ? is for basic types, such as int,
and row(L) for rows types not containing labels in L.
The grammar of the multi-sorted algebra of types τ and type formers F is defined as:
τ ::= Types
|α Type variable
|τ F Applied type former
| `L : τ :: τ Row cons
| ∂ Lτ Row uniform
| µα.τ Equi-recursive type
where T denotes a basic type constructor – in the dissertation we do not explicitly distinguish
type formers from type constructors as they are often treated the same in many contexts.
Let S be a signature for basic type constructors T, defining an arity function arityS mapping
type constructors T to their arity n ∈ N. A sorting context Γ is a sequence of bindings of type
variables α to sorts s.
Ill-sorted types are prevented using sorting judgements of the form S; Γ ` τ :: s, read as: the type
τ has the sort s in the context Γ and signature S. Like typing rules they are defined inductively,
51
as shown below:
(Type-var)
S; Γ ` α :: Γ(α)
arityS (T ) = n ∀1 ≤ i ≤ n.S; Γ ` τi :: s
(Type-former-constr)
S; Γ ` τ Ts :: s
S; Γ ` τ :: row(∅)
(Type-former-variant)
S; Γ ` Σ τ :: ?
S; Γ ` τ :: ?
(Type-row-uniform)
S; Γ ` ∂ L τ :: row(L)
S; Γ, α : s ` τ :: s
(Type-mu)
S; Γ ` µα.τ :: s
The superscripts in the algebra of types ensure that symbols are not overloaded and that each
symbol has a unique sort or signature; however, we often omit these superscripts for clarity (as
in Appendix C).
Our algebra is associated with an equaltional theory E, defined by the following set of axioms:
Commutativity Labels within row cons and row uniform types may be permuted. That is,
for all labels `1 , `2 ∈ L, finite subset of labels L ⊆fin L \ {`1 , `2 }, and types τ1 , τ2 , τ3 , the
following axioms hold:
L∪{`1 } L∪{`2 }
(Type-eq-comm-row-cons)
`L1 : τ1 :: (`2 : τ2 :: τ3 ) = `L2 : τ2 :: (`1 : τ1 :: τ3 )
(Type-eq-comm-row-uniform)
∂ L τ1 = `L1 : τ1 :: ∂ L∪{`1 } τ1
52
Semantics We now formally define the semantic interpretation of types. Informally, the
model consists of graphical ground types generated by the grammar. However, the inclusion of
rows and equi-recursive types complicates matters.
We describe our graphical types using the notion of paths. A path π is a sequence of integers or
labels. The empty path is denoted as , and the concatenation of the path π1 followed by π2 is
written π1 · π2 .
A graphical term t over a signature S is defined as a non-empty partial function from paths
to S that is prefix-closed and well-sorted. The subterm of t rooted at π, written t \ π, is the
function π 0 7→ t(π · π 0 ). The signature of Dromedary’s graphical types Sdrom is given by:
Thus, we define a graphical type t as a graphical term over the signature Sdrom with a finite
number of distinct subterms1 . We write T for the set of graphical types. The set of graphical
types of sort s is defined as Ts = {t ∈ T : Sdrom ` t :: s}.
The interpretation of a type τ of sort s, under the ground assignment ϕ (Section 2.1.2), written
ϕ(τ s ) is defined as follows:
ϕ(αs ) = ϕ(α)
ϕ(τ T? ) = t ∈ T?
s.t t() = T
∧ ∀1 ≤ i ≤ arityS (T).t \ i = ϕ(τi? )
ϕ τ Trow(L) = t ∈ Trow(L)
s.t t() = L
∧ ∀` ∈ L \ L.t() = T
∧ ∀` ∈ L \ L, 1 ≤ i ≤ arityS (T).t \ (` · i) = ϕ(τi? )
ϕ ((Σ τ )? ) = t ∈ T?
s.t t() = Σ
∧ t \ 1 = ϕ(τ row(∅) )
ϕ (∂ L τ )row(L) = t ∈ Trow(L)
s.t t() = L
∧ t \ ` = ϕ(τ1? )
row(L∪{`})
∧ ∀`0 ∈ L \ (L ∪ {`}).t \ `0 = ϕ τ2 \ `0
ϕ ((µα.τ )s ) = t ∈ Ts
s.t t = (ϕ, α 7→ t)(τ s )
1
This permits cyclic types with a finite encoding.
53
Type Abbreviations
A type abbreviation is a type constructor T with a isomorphism α T ∼
= τT , where α is a tuple of
(disjoint) type variables such that fv(τT ) ⊆ α.
To reason about equalities in the presence of type abbreviations, we seek to develop rewriting
strategies that carry out the ‘expansions’ of abbreviations.
Intuitively, the rule defines the expansion of the head of t1 , yielding the type t2 .
This rewriting rule applies head expansion in some context (or path π) within t1 , resulting
in the expansion t2 .
The reflexive transitive closure of T is denoted ∗T and we define the complete expansion
∞ ∞ ∗
T relation by the equivalence: t1 T t2 if and only if t1 T t2 6 T .
Semantics
Semantically, constraints are interpreted in the model M consisting of:
(i) The set of graphical types (henceforth referred to as ground types) t for types τ defined in
the above section.
(ii) The set of ground ambivalent types z for ambivalent types, defined as sets of ground types:
z ::= {t1 , . . . , tn }
54
ground ambivalent types z. An environment ρ is a partial function from term variables x to sets
of ground ambivalent types.
Implications introduce equalities that must be taken into account when checking the consistency
of ground ambivalent types – using an equational context E. A ground equational context
E ::= · | E, t = t is a collection of assumed equations between ground types. We write
E t1 =A t2 , if t1 , t2 are contextually equal under the equational context E. Consistency of
ambivalent types E z is simply defined as pairwise equality under the equational context:
∀1 ≤ i, j ≤ |z|.E ti =A tj
z =A {t}
(Coercion)
z :> t
Intuitively, the axiom states that z :> t holds if the ambivalent type z is the non-ambiguous
type t. This allows us to coerce ambivalent types to types, which is required when embedding
ambivalent variables in rigid constraints R during pattern matching.
Satisfiability judgements, defined inductively, take the form E; ϑ; ϕ; ρ C, read as: in the
environment ρ, under the equational context E, the assignments ϑ, ϕ satisfy C:
∀i E; ϑ; ϕ; ρ Ci
(Truth) (Conj)
E; ϑ; ϕ; ρ true E; ϑ; ϕ; ρ C1 ∧ C2
E; ϑ; ϕ, α 7→ t; ρ C ∀t E; ϑ; ϕ, α 7→ t; ρ C
(Exists) (Forall)
E; ϑ; ϕ; ρ ∃α.C E; ϑ; ϕ; ρ ∀α.C
E; ϑ, ζ 7→ z; ϕ; ρ C E z E, ϕ(R); ϑ; ϕ; ρ C
(Exists) (Implication)
E; ϑ; ϕ; ρ ∃ζ.C E; ϑ; ϕ; ρ R =⇒ C
E; ϑ; ϕ; ρ, (E; ϑ; ϕ; ρ)(Γ) C
(Def)
E; ϑ; ϕ; ρ def Γ in C
E; ϑ; ϕ; ρ ∃Γ E; ϑ; ϕ; ρ, (E; ϑ; ϕ; ρ)(Γ) C
(Let)
E; ϑ; ϕ; ρ let Γ in C
E; ϑ; ϕ; ρ, (E; ϑ; ϕ; ρ)(Π) C
(Def-rec)
E; ϑ; ϕ; ρ def rec Π in C
E; ϑ; ϕ; ρ ∃Π E; ϑ; ϕ; ρ, (E; ϑ; ϕ; ρ)(Π) C
(Let-rec)
E; ϑ; ϕ; ρ let rec Π in C
55
where the interpretation of constrained contexts and recursive contexts are given by:
n o
(E; ϑ; ϕ; ρ)(∀α, ζ.C ⇒ ζ) = ϑ0 (ζ) : ϕ =\α ϕ0 ∧ ϑ =\ζ ϑ0 ∧ E; ϑ0 ; ϕ0 ; ρ C
^ ^
∃(x : ∀α.C ⇐ τ , x : ∀β, ζ.D ⇒ ζ) ' ∀β.∃ζ.def x : ∀α.τ , x : ζ in Ci ∧ Dj
i j
Intuitively ∃Γ checks whether the constraint C in Γ is satisfiable for all rigid variables α.
Similarly, ∃Π checks that all constraints within Π are satisfiable within the recursive context.
56
C Type System
In this appendix we present the entirety of Dromedary’s type system. We begin by formally
defining the complete multi-sorted algebra of types τ and type formers F:
τ ::= Type
|α Type variable
|ζ Ambivalent type variable
|τ →τ Function type
|τ T Applied type constructor
| τ × ··· × τ Tuple type
|Στ Polymorphic variant type
| ` : τ :: τ Row cons
| ∂τ Row uniform
| µα.τ Equi-recursive type
| τ where α = τ Explicit type substitution
where ` denotes a label and T denotes a type constructor. In the context of Dromedary’s
polymorphic variants, we define labels as ` ::= K. For more details regarding the multi-sorted
algebra of types, we refer the reader to Appendix B.
Split types For the translation of types τ into shallow types used in constraints, we require the
notion of split types. Split types ς are a pair Ξ B ζ, where the (deep) type may be reconstructed
from the subset constraints in Ξ and variable ζ.
More formally, the grammar of split types ς is given by:
bαc = · B α
bζc = ∃ · . · Bζ
bτ1 → τ2 c = Ξ1 × Ξ2 B ζ1 → ζ2 where bτi c = Ξi B ζi
bτ Tc = Ξ1 × · · · × Ξn B ζ T where bτi c = Ξi B ζi
bτ1 × · · · × τn c = Ξ1 × · · · × Ξn B ζ1 × · · · × ζn where bτi c = Ξi B ζi
bΣ τ c = Ξ B Σ ζ where bτ c = Ξ B ζ
57
b` : τ1 :: τ2 c = Ξ1 , Ξ2 B ` : ζ1 :: ζ2 where bτi c = Ξi B ζi
b∂τ c = Ξ B ∂ζ where bτ c = Ξ B ζ
bµα.τ c = ∃ζ.Ξ, ζ ⊇ ψ B ζ where b{ζ/α}τ c = Ξ B ψ
bτ1 where α = τ2 c = Ξ1 × Ξ2 B ζ1 where bτ2 c = Ξ2 B ζ2 , b{ζ2 /α}τ1 c = Ξ1 B ζ1
We extend constraints with a subset constraint for types τ ⊆ ζ using shallow type translations,
such that the following equivalence holds:
^
τ ⊆ ζ ' ∃ζ. Ω ∧ ζ = ζ 0 where bτ c = ∃ζ.Ω B ζ 0
Typing Rules
In this section, we present all of Dromedary’s typing rules.
Structures A structural context Ψ is a sequence of label and constructor bindings, that is:
We write bΨc as the abbreviation context A consisting of abbreviations (or aliases) in Ψ. The
following table specifies the judgements for structural contexts:
Judgement Interpretation
D ::= Definition
|· Empty definition
| D, D Conj definition
| def Γ Def binding
| let Γ Let binding
58
| def rec Π Recursive def binding
| let rec Π Recursive let binding
(Dromedary-str-nil)
Ψ; · ` ·
Ψ; Υ ` vb
(Dromedary-str-item-let)
Ψ; Υ ` let vb Ψ
Ψ; Π ` vb
(Dromedary-str-item-let-rec)
Ψ; let rec Π ` let rec vb Ψ
(Dromedary-str-item-type)
Ψ; · ` type td1 and . . . and tdn Ψ, td
^
Ψ0 = Ψ, K : ∀α.[∃β.] E ⇒ [τ →] α T
(Dromedary-str-item-type-ext)
Ψ; · ` type α T += K [of β.τ ] [constraint E] Ψ0
(Dromedary-str-item-external)
Ψ; def x : σ ` external x : σ = ”%. . . ” Ψ
(Dromedary-str-item-exception)
Ψ; · ` exception K [of τ ] Ψ, K : ∀ · .∃ · .true ⇒ [τ →] exn
Expressions Expression judgements are of the form C ` e : ζ, read as: under the satisfiable
assumptions C, the expression e has the ambivalent type ζ. As in the dissertation (section 3.1.1),
we leave the structural context Ψ used within the judgements implicit.
The restriction to ambivalent type variables in the judgement leads to a restricted and explicit
type system, thus for a more natural presentation, we permit judgements of the form C ` e : τ
59
and C ` e : ψ, given by:
C`e:ζ ζ #τ
(Dromedary-exp-tau)
∃ζ.C ∧ τ ⊆ ζ ` e : τ
C`e:ζ ζ #ψ
(Dromedary-exp-shallow)
∃ζ.C ∧ ψ ⊆ ζ ` e : ψ
For various features in Dromedary’s type system discussed in Section 3.1, we expand the
constraint language with the following constructs:
Constructor instantiation
K ≤ [ζ1 →] ζ2 ' ∃ζα , ζβ .θR ∧ ζα T ⊆ ζ2 [∧ θτ ⊆ ζ1 ] if Ψ ` K : ∀α.∃β.R ⇒ [τ →] α T
where θ = {ζα /α, ζβ /β}
Label constraints
` ≤ ζ1 → ζ2 ' ∃ζα , ζβ .{ζα /α, ζβ /β}τ ⊆ ζ1 ∧ ζα T ⊆ ζ2 if Ψ ` ` : ∀α.(∀β.τ ) → α T
` : (∀β.C ⇒ ζ1 ) → ζ2 ' ∃ζα .ζα T ⊆ τ2 ∧ ∀β.{ζα /α}τ ⊆ ζ1 ∧ C if Ψ ` ` : ∀α.(∀β.τ ) → α T
Variant constraints
ζ ≤ ‘K1 [of ζ1 ] | . . . | ‘Kn [of ζn ] ' ζ ⊇ ‘K1 : [ζ1 ] :: . . . :: ‘Kn : [ζn ] :: ∂ absent
ζ ≥ ‘K1 [of ζ1 ] | . . . | ‘Kn [of ζn ] ' ∃ζρ .ζ ⊇ ‘K1 : [ζ1 ] :: . . . :: ‘Kn : [ζn ] :: ζρ
60
The typing rules are now given by:
Cx≤ζ
(Dromedary-exp-var)
C`x:ζ
Cc≤ζ
(Dromedary-exp-const)
C`c:ζ
C ` p → e : ζ1 ⇒ ζ2
(Dromedary-exp-fun)
C ` fun p → e : ζ1 → ζ2
C1 ` e1 : ζ1 → ζ2 C2 ` e2 : ζ1
(Dromedary-exp-app)
C1 ∧ C2 ` e1 e2 : ζ2
Υ ` vb C`e:ζ
(Dromedary-exp-let)
Υ in C ` let vb in e : ζ
Π ` vb C`e:ζ
(Dromedary-exp-let-rec)
let rec Π in C ` let vb in e : ζ
C1 ` e : ζ1 C2 uop ≤ ζ1 → ζ2
(Dromedary-exp-uop)
C1 ∧ C2 ` uop e : ζ2
C1 ` e1 : ζ1 C2 ` e2 : ζ2 C3 bop ≤ ζ1 → ζ2 → ζ3
(Dromedary-exp-bop)
C1 ∧ C2 ∧ C3 ` e1 bop e2 : ζ3
C1 ` e1 : bool C2 ` e2 : ζ C3 ` e3 : ζ
(Dromedary-exp-ifthenelse)
C1 ∧ C2 ∧ C3 ` if e1 then e2 else e3 : ζ
C1 ` e : ζ1 C2 x ≤ ζ2
(Dromedary-exp-forall)
let ∀α, ζ.C1 ⇒ x : ζ2 in C2 ` forall (type α) → e : ζ1
C ` {ζ/α}e : ζ
(Dromedary-exp-exists)
∃ζ.C ` exists (type α) → e : ζ
C τ ⊆ ζ1 C τ ⊆ ζ2 C ` e : ζ2
(Dromedary-exp-annot)
C ` (e : τ ) : ζ1
∀1 ≤ i ≤ n. Ci ` ` = e : ζ Ψ ` T { `1 ; . . . ; `n }
^n (Dromedary-exp-record)
i=1 Ci ` { `1 = e1 ; . . . ; `n = en } : ζ
C ` e : ζ1
(Dromedary-exp-record-field)
` : ∀β.C ⇒ ζ1 → ζ2 ` ` = e : ζ2
C ` ≤ ζ1 → ζ2 C ` e : ζ2
(Dromedary-exp-field)
C ` e.` : ζ1
61
∀1 ≤ i ≤ n. Ci ` ei : ζi
^n (Dromedary-exp-tuple)
i=1 Ci ` (e1 , . . . , en ) : ζ1 × · · · × ζn
C K ≤ [ζ1 →] ζ2 [C ` e : ζ1 ]
(Dromedary-exp-construct)
C ` K [e] : ζ2
Ce ` e : ζe ∀1 ≤ i ≤ n.Ci ` h : ζe ⇒ ζ
^n (Dromedary-exp-match)
Ce ∧ i=1 Ci ` match e with (h1 | . . . | hn ) : ζ
Ce ` e : ζ ∀1 ≤ i ≤ n.Ci ` h : exn ⇒ ζ
^n (Dromedary-exp-try)
Ce ∧ i=1 Ci ` try e with (h1 | . . . | hn ) : ζ
C1 ` e1 : unit C 2 ` e2 : ζ
(Dromedary-exp-seq)
C1 ∧ C2 ` e 1 ; e 2 : ζ
C1 ` e1 : bool C2 ` e2 : unit
(Dromedary-exp-while)
C1 ∧ C2 ` while e1 do e2 done : unit
[C ` e : ζ 0 ] C ζ ≥ ‘K [of ζ 0 ]
(Dromedary-exp-variant)
C ` ‘K [e] : ζ
C ` e : ζe C ζe ≤ ‘Ki [of ζi ]
∀1 ≤ i ≤ n.Ci ` ‘K [pi ] → ei : ‘Ki [of ζi ] ⇒ ζ
^n (Dromedary-exp-var-match-closed)
C∧ i=1 Ci ` match e with ‘Ki [pi ] → ei : ζ
C ` e : ζe C ζe ≥ ‘Ki [of ζi ]
∀1 ≤ i ≤ n.Ci ` ‘K [pi ] → ei : ‘Ki [of ζi ] ⇒ ζ
Cn+1 ` en+1 : ζ
^n+1 (Dromedary-exp-var-match-open)
C∧ i=1 Ci ` match e with (‘Ki [pi ] → ei | → en+1 ) : ζ
C ` e : ζ2
(Dromedary-exp-eq)
C ∧ ζ1 = ζ2 ` e : ζ1
C ` e : ζ2 ζ1 6= ζ2
(Dromedary-exp-exist)
∃ζ1 .C ` e : ζ2
Judgements for non-recursive and recursive value bindings are of form Υ ` vb and π ` vb,
respectively.
C`e:ζ
(Dromedary-vb-rec-mono)
x : ∀α, fav(C), ζ.C ⇒ ζ ` (type α) x = e
62
C`e:τ
(Dromedary-vb-rec-poly)
x : ∀α.C ⇐ τ ` (type α) x : τ = e
Cp ` p : ζ ∃α, β.∆ ⇒ R Ce ` v : ζ
(Dromedary-vb-val)
let ∃α.∀β.R =⇒ ∀γ, fav(Cp , Ce , ∆).Cp ∧ Ce ⇒ ∆ ` (type γ) p = v
Cp ` p : ζ ∃α, β.∆ ⇒ R Ce ` e : ζ
(Dromedary-vb-nonval)
∃α.∀β.R =⇒ ∀γ.∃fav(Ce , Cp , ∆).Ce ∧ Cp ∧ def ∆ ` (type γ) p = e
Patterns Judgements for patterns and cases are of the form: C ` p : τ Θ and C ` p →
e : ζ1 ⇒ ζ2 ; interpreted as: under the satisfiable assumptions C, the pattern p has the type ζ,
binding variables in the generalized fragment Θ and under the satisfiable assumptions C, the
case p → e matches values of type ζ1 returning values of type ζ2 , respectively.
A generalized fragment Θ is a tuple, consisting of a context of flexibly bound variables α in
rigid constraints, existential variables β, a rigid constraint R, and a fragment ∆, written as
Θ ::= ∃α, β.∆ ⇒ R.
The addition of flexibly bound (non-ambivalent) variables α in Θ is for propagation of type
information from instantiation constraints defined below. These constraints involve coercion
constraints ζ :> α, which ensure our ambivalent types can be coerced to non-ambiguous types.
We redefine constructor instantiation constraints for patterns, since constructor instantiation
for patterns semantically differs to instantiation in expressions, using the following equivalences:
As with expressions, we permit judgements involving types and ambivalent structures, yielding
the analogous rules Dromedary-pat-tau and Dromedary-pat-shallow. The typing rules are given
by:
C ` p → e : ζ1 ⇒ ζ2
(Dromedary-var-case1 )
C : ‘K [p] → e : ‘K of ζ1 ⇒ ζ2
C`p:ζ
(Dromedary-var-case2 )
C : ‘K → e : ‘K ⇒ ζ
63
C1 ` p : ζ1 ∃β.∆ ⇒ R C2 ` e : ζ2
(Dromedary-case)
∃α.∀β.R =⇒ let ∀fav(C1 , ∆).C1 ⇒ ∆ in C2 ` p → e : ζ1 ⇒ ζ2
(Dromedary-pat-wild)
C` :ζ ∃ · .· ⇒ true
(Dromedary-pat-var)
C`x:ζ ∃ · .x : ζ ⇒ true
Cc≤ζ
(Dromedary-pat-const)
C`c:ζ ∃ · .· ⇒ true
C K ≤ ∃α.ζ ⇒ R
(Dromedary-pat-construct0 )
C`K:ζ ∃α.· ⇒ R
C K ≤ ∃α, β.ζ1 → ζ2 ⇒ R C ` p : ζ1 Θ
(Dromedary-pat-construct1 )
C ` K : ζ2 ∃α, β.Θ ⇒ R
∀1 ≤ i ≤ n. Ci ` pi : ζi Θi
^n (Dromedary-pat-tuple)
i=1 Ci ` (p1 , . . . , pn ) : ζ1 × · · · × ζn Θ1 × · · · × Θn
C ` e : ζ2 Θ
(Dromedary-pat-eq)
C ∧ ζ1 = ζ2 ` e : ζ1 Θ
C ` e : ζ2 Θ ζ1 6= ζ2
(Dromedary-pat-exist)
∃ζ1 .C ` p : ζ2 Θ
64
D Computations
The domain-specific language for computations (Section 3.2.3) is embedded with the following
signature:
65
*)
type 'a t
include Monad.S with type 'a t := 'a t
val ( <*> )
: ('a -> 'b) Constraint.t
-> 'a Constraint.t
-> 'b Constraint.t
66
module Let_syntax : sig
val return : 'a -> 'a t
val ( let@ )
: 'a Binder.t
-> ('a -> 'b Constraint.t t)
-> 'b Constraint.t t
val ( >>| )
: 'a Constraint.t
-> ('a -> 'b)
-> 'b Constraint.t
val ( <*> )
: ('a -> 'b) Constraint.t
-> 'a Constraint.t
-> 'b Constraint.t
67
E Proposal
2377E
Director of Studies:
Signatures:
69
Introduction
Objective Caml (OCaml) introduced by X. Leroy [7] is a popular and advanced functional
programming language based on the ML language – a simple calculus defined by R. Milner [8]
offering a restricted form of polymorphism, known as let-based polymorphism, with decidable
type inference.
The core language (referred to as Core ML) extends ML with the following features: mutually
recursive let-bindings, algebraic data types, patterns, constants, records, mutable references
(and the value restriction), exceptions and type annotations. OCaml’s major extensions on
Core ML consist of first-class and recursive modules, classes and objects, polymorphic variants,
semi-explicit first-class polymorphism, generalized algebraic data types (GADTs), the relaxed
value restriction, type abbreviations, and labels.
In this project, we will implement a constraint-based type inference algorithm for a subset of the
OCaml, provisionally dubbed Dromedary, consisting of ML with mutually recursive let-bindings,
records, type annotations (a subset of Core ML, provisionally dubbed Procaml 1 ) and GADTs.
OCaml’s inference algorithm is based on algorithm W [8] with D. Remy’s [11] efficient rank-based
generalization and modifications for the above extensions. While efficient, it has become difficult
to maintain and evolve [14]. Dromedary’s solution is to re-implement OCaml’s type inference
using a constraint-based approach.
A constraint-based approach would provide a modular implementation of type inference, with
separate constraint generation, constraint solving, and type reconstruction phases, using a small
independent constraint language. Additional advantages include: combining existing constraint-
based approaches with OCaml’s approaches to increase permissiveness and applications in
OCaml’s ecosystem.
Previous work to improve OCaml’s inference algorithm focuses on incremental changes to the
current implementation [14]. Whereas our work is more ambitious and aims to provide the
foundation for a complete rewrite – which we believe to be worthwhile.
Despite Dromedary being seemingly simple, its inference will suffer from the many challenging
issues of GADT type inference, with previous work highlighting that:
• Type Systems with GADTs lack the principal (“most general”) type property. M. Sulzmann
et al. [13] show that programs with GADTs have infinitely many maximal types. Hence
a complete (unrestricted ) inference algorithm must consider all of these types, adding
significant complexity.
• GADT pattern matching introduces local typing constraints, that may result in different
branch types. Reconciling these types is difficult.
• GADT programs extensively rely on A. Mycroft’s polymorphic recursion [9]. However, F.
Henglein [4] and A. J. Kforuy et al. [6] proved that inference with polymorphic recursion
is undecidable.
Dromedary addresses these issues via a novel combination of Haskell’s OutsideIn [12] and
OCaml’s ambivalent types [2]. Constraint propagation and ambivalent types equip Dromedary
with sufficient expressiveness to reconcile differing branch types. Dromedary will require type
annotations for polymorphic recursion, guaranteeing the decidability of inference.
We will evaluate Dromedary’s inference algorithm against OCaml’s (with respect to the imple-
mented features), considering aspects such as permissiveness and efficiency.
1
After Procamelus, an extinct genus of camel
70
Starting Point
I’m familiar with types, having studied Semantics of Programming Languages. I have no
previous experience in type inference beyond ML’s classical inference algorithms [8]. I have a
basic knowledge of constraint solving having studied Prolog and Logic and Proof.
Prior to starting, I have read literature on OCaml’s type system to investigate the feasibility of
the project. I have practical experience writing OCaml programs from Foundations of Computer
Science and extra-circular study. I have practical experience extending the OCaml type checker.
1. Dromedary’s type system will be formally defined, using concepts from Semantics of
Programming Languages. Its operational semantics is given by a subset of OCaml’s
semantics.
GADTs will use a novel combination of Haskell’s OutsideIn [12] and OCaml’s ambivalent
types [2], designed to increase permissiveness.
2. We will design a (first-order) constraint language for Dromedary. We will then define a
mapping (known as constraint generation mapping) from candidate typing judgements
(e.g. e : τ ) to constraints.
4. Several properties of Dromedary will be stated but not proved. These include principal
types, decidability, soundness and completeness of inference. We will verify these properties
empirically, using tests from the OCaml type checker test suite.
5. Dromedary’s inference algorithm will extend F. Pottier’s framework [10] for modular and
efficient constraint generation, constraint solving, and type reconstruction, implemented
in OCaml.
The first-order unification algorithm for constraint solving will follow Huet [5], using an
efficient union-find data structure.
This project will follow an incremental structure, focusing on Procaml followed by GADTs, at
each stage, extending the type system, semantics, constraint language, and constraint solver.
Thus the structure of the project is as follows:
1. An in-depth study in OCaml’s type system to ensure I have the correct details before
starting work, focusing on [2].
4. Defining and implement Dromedary’s constraints and constraint solving for Procaml.
71
5. Implementing Dromedary’s GADTs.
Success Criteria
For the project to be deemed a success, the following must be successfully completed:
1. Design the type system of Dromedary. This should support ML with GADTs.
Evaluation
The following is a list of possible extensions to the project:
2. Adding polymorphic variants [1] to Dromedary. The implementation will require the
notion of subtyping constraints.
3. Adding semi-explicit first-class polymorphism [3] to Dromedary. This will use an efficient
rank-based approach [11].
4. Proving properties about Dromedary’s semantics and inference, including progress, preser-
vation, principal types, and the soundness and completeness of inference.
72
Weeks 3 to 4 (21st Oct – 3rd Nov)
Define the constraint language and its semantics.
Implement Dromedary’s constraint solver as a set of constraint rewriting rules. Write some
example constraints and verify the solver solves them correctly.
Milestone: Implemented constraint solver for Dromedary.
73
Weeks 17 to 18 (27th Jan – 9th Feb)
Progress report deadline and presentation.
Start work on possible project extensions if time permits. Focus on extension (1) - adding
polymorphic variants to Dromedary.
Add additional test cases for work completed on extension (1).
Milestone: Finish implementation of extension (1).
Milestone: Complete progress report and presentation.
74
Incorporate feedback from supervisor and submit a new draft to supervisor and director of
studies.
Slack time for improving code quality, focusing on documentation and code style.
Resource Declaration
I will be using my personal computer (3.20GHz i7-8700, 16GB RAM, 1TB SSD) as my primary
machine for software development. I accept full responsibility for this machine and I have made
contingency plans to protect myself against hardware and/or software failure.
As a backup, I will use my personal laptop (Razer Blade Stealth 2017 – 1.80 GHz i7-8500U,
16GB RAM, 1TB SSD) and the Computing Service’s MCS. I will periodically backup the
dissertation and project implementation to Git version control (GitHub).
75
References
[1] Jacques Garrigue. “Simple Type Inference for Structural Polymorphism”. In: The Second
Asian Workshop on Programming Languages and Systems, APLAS’01, Korea Advanced
Institute of Science and Technology, Daejeon, Korea, December 17-18, 2001, Proceedings.
2001, pp. 329–343.
[2] Jacques Garrigue and Didier Rémy. “Ambivalent Types for Principal Type Inference with
GADTs”. In: Programming Languages and Systems - 11th Asian Symposium, APLAS
2013, Melbourne, VIC, Australia, December 9-11, 2013. Proceedings. Ed. by Chung-chieh
Shan. Vol. 8301. Lecture Notes in Computer Science. Springer, 2013, pp. 257–272. doi:
10.1007/978- 3- 319- 03542- 0\_19. url: https://doi.org/10.1007/978- 3- 319-
03542-0%5C_19.
[3] Jacques Garrigue and Didier Rémy. “Semi-Explicit First-Class Polymorphism for ML”.
In: Inf. Comput. 155.1-2 (1999), pp. 134–169. doi: 10.1006/inco.1999 .2830. url:
https://doi.org/10.1006/inco.1999.2830.
[4] Fritz Henglein. “Type Inference with Polymorphic Recursion”. In: ACM Trans. Program.
Lang. Syst. 15.2 (1993), pp. 253–289. doi: 10 . 1145 / 169701 . 169692. url: https :
//doi.org/10.1145/169701.169692.
[5] Gérard P. Huet. “A Unification Algorithm for Typed lambda-Calculus”. In: Theor. Comput.
Sci. 1.1 (1975), pp. 27–57. doi: 10.1016/0304-3975(75)90011-0. url: https://doi.
org/10.1016/0304-3975(75)90011-0.
[6] A. J. Kfoury, Jerzy Tiuryn, and Pawel Urzyczyn. “Type Reconstruction in the Presence of
Polymorphic Recursion”. In: ACM Trans. Program. Lang. Syst. 15.2 (1993), pp. 290–311.
doi: 10.1145/169701.169687. url: https://doi.org/10.1145/169701.169687.
[7] Xavier Leroy. The ZINC experiment: an economical implementation of the ML language.
Technical report 117. INRIA, 1990. url: https://xavierleroy.org/publi/ZINC.pdf.
[8] Robin Milner. “A Theory of Type Polymorphism in Programming”. In: J. Comput. Syst.
Sci. 17.3 (1978), pp. 348–375. doi: 10 . 1016 / 0022 - 0000(78 ) 90014 - 4. url: https :
//doi.org/10.1016/0022-0000(78)90014-4.
[9] Alan Mycroft. “Polymorphic Type Schemes and Recursive Definitions”. In: Interna-
tional Symposium on Programming, 6th Colloquium, Toulouse, France, April 17-19, 1984,
Proceedings. Ed. by Manfred Paul and Bernard Robinet. Vol. 167. Lecture Notes in Com-
puter Science. Springer, 1984, pp. 217–228. doi: 10.1007/3-540-12925-1\_41. url:
https://doi.org/10.1007/3-540-12925-1%5C_41.
[10] François Pottier. “Hindley-milner elaboration in applicative style: functional pearl”. In:
Proceedings of the 19th ACM SIGPLAN international conference on Functional program-
ming, Gothenburg, Sweden, September 1-3, 2014. Ed. by Johan Jeuring and Manuel
M. T. Chakravarty. ACM, 2014, pp. 203–212. doi: 10.1145/2628136.2628145. url:
https://doi.org/10.1145/2628136.2628145.
[11] Didier Rémy. Extending ML Type System with a Sorted Equational Theory. Research
Report 1766. Rocquencourt, BP 105, 78 153 Le Chesnay Cedex, France: Institut National
de Recherche en Informatique et Automatisme, 1992. url: http://gallium.inria.fr/
~remy/ftp/eq-theory-on-types.pdf.
76
[12] Tom Schrijvers et al. “Complete and decidable type inference for GADTs”. In: Proceeding
of the 14th ACM SIGPLAN international conference on Functional programming, ICFP
2009, Edinburgh, Scotland, UK, August 31 - September 2, 2009. Ed. by Graham Hutton
and Andrew P. Tolmach. ACM, 2009, pp. 341–352. doi: 10.1145/1596550.1596599. url:
https://doi.org/10.1145/1596550.1596599.
[13] Martin Sulzmann et al. Type inference for GADTs via Herbrand constraint abduction.
Tech. rep. Jan. 2008. url: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.
1.1.142.4392.
[14] The OCaml Team. TODO for the OCaml type-checker implementation. 2020. url: https:
//github.com/ocaml/ocaml/blob/4.12.0/typing/TODO.md.
77