0% found this document useful (0 votes)
43 views84 pages

Tese Do Alistair

Tese do Alistair

Uploaded by

Sam Uel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views84 pages

Tese Do Alistair

Tese do Alistair

Uploaded by

Sam Uel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

Alistair O’Brien

Typing OCaml in OCaml:


A Constraint-Based Approach

Computer Science Tripos – Part II


Queens’ College

May 13, 2022


Declaration
I, Alistair O’Brien of Queens’ College, being a candidate for Part II of the Computer Science
Tripos, hereby declare that this dissertation and the work described in it are my own work,
unaided except as may be specified below, and that the dissertation does not contain material
that has already been used to any substantial extent for a comparable purpose.

I, Alistair O’Brien of Queens’ College, am content for my dissertation to be made available to


the students and staff of the University.

Signed: Alistair O’Brien


Date: May 13, 2022
Proforma
Candidate number: 2377E
Project Title: Typing OCaml in OCaml:
A Constraint-Based Approach
Examination: Computer Science Tripos – Part II, 2022
Word Count: 119991
Code Line Count: 158882
Project Originator: The Dissertation Author
Project Supervisor: Mistral Contrastin and Dr. Jeremy Yallop

Original Aims of the Project


The project’s original aim was to demonstrate the feasibility of a constraint-based type inference
algorithm for a subset of OCaml, dubbed Dromedary. The subset would extend the ML calculus
with generalised algebraic data types. Unlike OCaml’s current inference algorithm, which has
become challenging to maintain and evolve, Dromedary’s constraint-based approach would
prioritise modularity and correctness. The permissiveness and performance of Dromedary’s
inference algorithm would be evaluated against OCaml’s current (4.12.0) implementation.
Possible extensions included implementing side-effecting primitives, polymorphic variants, and
semi-explicit first-class polymorphism.

Work Completed
Exceeded all success criteria and completed all extensions. Dromedary supports ML polymor-
phism, ADTs, patterns, records, side-effecting primitives, mutually recursive let-bindings and
type definitions, GADTs, polymorphic variants, extensible variants, semi-explicit first-class
polymorphism, type abbreviations, and structures. I formally defined Dromedary and its type
system in a constraint-based setting. I developed a sufficiently expressive constraint language,
with novel extensions on existing work. I implemented a modular and efficient constraint-based
type inference algorithm for Dromedary, which is equally permissive and more performant in
comparison to OCaml.

Special Difficulties
None.

1
This word count was computed using texcount.
2
This code line count was computed using cloc (excluding autogenerated test output).

ii
Contents
1 Introduction 1
1.1 OCaml . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Project Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Preparation 3
2.1 Type Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 The ML Type System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.3 Constraint-Based ML : PCB . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 OCaml . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Modules and Functors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Functors, Applicatives and Monads . . . . . . . . . . . . . . . . . . . . . 9
2.2.3 Generalised Algebraic Data Types . . . . . . . . . . . . . . . . . . . . . . 11
2.2.4 Polymorphic Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.5 Semi-Explicit First-Class Polymorphism . . . . . . . . . . . . . . . . . . 13
2.2.6 Polymorphic Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Requirements Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.1 Model of Software Development . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.2 Tools Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.3 License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Starting Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Implementation 18
3.1 Dromedary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 Algebraic Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.2 Annotations and Polymorphic Recursion . . . . . . . . . . . . . . . . . . 20
3.1.3 Semi-explicit First-class Polymorphism . . . . . . . . . . . . . . . . . . . 22
3.1.4 Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.5 Polymorphic Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.6 Generalised Algebraic Data Types . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Inference Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

iii
3.2.1 Repository Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.2 Constraints and Type Reconstruction . . . . . . . . . . . . . . . . . . . . 28
3.2.3 Typing and Constraint Generation . . . . . . . . . . . . . . . . . . . . . 32
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4 Evaluation 34
4.1 Project Requirements and Success Criteria . . . . . . . . . . . . . . . . . . . . . 34
4.2 Permissiveness of Dromedary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Conclusions 40
5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2 Lessons Learnt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

Bibliography 41

A Untyped Syntax 46

B Constraints 50

C Type System 57

D Computations 65

E Proposal 68

iv
List of Figures
1.1 An overview of the constraint-based inference pipeline. . . . . . . . . . . . . . . 1

2.1 The syntax-directed ML typing rules. . . . . . . . . . . . . . . . . . . . . . . . . 4


2.2 The inductive rules for semantic interpretation of constraints. . . . . . . . . . . 5
2.3 The constraint generation mapping for ML – Je : τ K is the constraint that holds
if and only if e has the type τ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 The PCB typing rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 The ML typing rules for polymorphic recursion from the Milner-Mycroft calculus
[35]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1 A selection of Dromedary’s typing rules related to algebraic data types. . . . . . 19


3.2 The tree-based (left) and graphical (right) representations of the type
( ' a - > ' a ) - > ' a - > ' a. . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 A selection of Dromedary’s polymorphic variant typing rules for pattern matching. 24
3.4 The formal definition of the type α expr, originally defined in Listing 2.6. . . . . 25
3.5 The relevant typing rules for GADTs from Dromedary’s type system. . . . . . . 27
3.6 A phase diagram of the OCaml compiler. . . . . . . . . . . . . . . . . . . . . . . 27
3.7 A visualisation of rank-based generalisation [41] for the generated constraint of
let id = fun x → x in id. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.8 The formal syntax of computations and binders. . . . . . . . . . . . . . . . . . . 32

4.1 Benchmarks of various programs using 10000 trials. A subset from the corpus is
used for permissiveness testing. Error bars represent ±2σ. . . . . . . . . . . . . 37
4.2 Benchmarks comparing Dromedary and OCaml’s asymptotic behaviour in classical
exponential cases for ML inference. Shaded areas represent the 95% confidence
interval (±2σ). 10000 trials for (a), 200 trials for (b). . . . . . . . . . . . . . . . 37

v
List of Listings
2.1 ML let-based polymorphism in action – fun-bound variables are monomorphic,
whereas let-bound variables are polymorphic. . . . . . . . . . . . . . . . . . . . . 4
2.2 The type definitions for a simple language using algebraic data types in OCaml –
e x p r and b i n _ o p are variant types and b i n d i n g is a record type. . . . . . . 8
2.3 A snippet demonstrating OCaml’s module structures and signatures. . . . . . . 9
2.4 An interpreter for the simple language from Listing 2.2. The implementation of
e v a l uses mutual recursion and labelled arguments. . . . . . . . . . . . . . . . 10
2.5 The signatures for functors, appliactives, and monads. . . . . . . . . . . . . . . . 10
2.6 The type definition of a simple DSL in OCaml using ADTs and GADTs. . . . . 11
2.7 The definition of the equality GADT in OCaml – the type ( ' a , ' b ) e q
encodes a “proof” that ' a is equal to ' b. . . . . . . . . . . . . . . . . . . . . . 11
2.8 A type definition for perfect trees in OCaml, taken from [36]. . . . . . . . . . . . 12
2.9 An example of polymorphic recursion in OCaml – requiring an explicit polymor-
phic annotation for decidable type inference. . . . . . . . . . . . . . . . . . . . . 13
2.10 The type definition of dependent associative list in OCaml using GADTs. . . . . 13
2.11 A demonstration of semi-explicit first-class polymorphism in OCaml, encoding
the polymorphic type ∀α.α key → α → α in the e l e m _ m a p p e r type. . . . . . 14
2.12 Extensible error types using polymorphic variants in OCaml, taken from the
project implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1 Examples of annotations in OCaml (on the left) and Dromedary (on the right),
illustrating the differences in the introduction of bounded type variables in
expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 The type definition of the ' a e x p r GADT in Dromedary – new syntax was
introduced for existential variables and explicit constraints. . . . . . . . . . . . . 25
3.3 Desugared (left) verses p p x _ l e t syntax (right) for applicatives (and monads). 29
3.4 A snippet of the C o n s t r a i n t s library interface. . . . . . . . . . . . . . . . . . 30
3.5 The module signature for Dromedary’s implementation of the union-find data
structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.6 The module signature for first-order unification structures. . . . . . . . . . . . . 31
3.7 An example of a composable unification structure using OCaml’s functors – the
structure F i r s t _ o r d e r extends a structure S adding (uni-sorted) variables. . 31
3.8 A snippet of Dromedary’s constraint generation illustrating the usage of con-
straints, computations, and binders for clear, compositional, and maintainable
code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

vi
1 Introduction
Since the late 1950s, many popular programming languages developed type systems and type
checkers. Type checkers give the assurance of type safety: “well-typed programs cannot go
wrong” [33], that is to say, a well-typed program is guaranteed not to violate any type system
properties at runtime. One of the problems with many statically typed languages is that they
require the programmer to annotate their programs with types. Type inference algorithms
alleviate this issue by inferring the type annotations rather than requiring the programmer to
provide them.
Type inference for functional programming languages such as Standard ML, Haskell and
Objective Caml (OCaml) is based on the ML calculus defined by Milner [33], which provides
decidable type inference for let-based polymorphism. Traditionally, inference algorithms for these
languages are extensions of algorithms W or J [33], which use partial substitutions to reason
about first-order equalities between types. However, these algorithms can become extremely
complicated when extending the ML language with additional features.
The purpose of this dissertation is to investigate type inference algorithms using a constraint-
based approach, specifically in the context of OCaml, to reduce the complexity introduced by
these additional features.

1.1 OCaml
OCaml, introduced by Leroy [29], is a popular functional programming language with an
advanced type system. The core language (referred to as Core ML) extends ML with the
following features: mutually recursive let-bindings, algebraic data types, patterns, constants,
records, mutable references (and the value restriction), exceptions and type annotations.
OCaml’s major extensions on Core ML consist of first-class and recursive modules, classes and
objects, polymorphic variants, semi-explicit first-class polymorphism, generalised algebraic data
types (GADTs), the relaxed value restriction, type abbreviations, and labels.
OCaml’s inference algorithm relies on an extension of algorithm W to efficiently deal with
type generalisation, using a technique known as rank-based generalisation [41], with additional
modifications for the above extensions.
However, it is widely accepted that OCaml’s inference algorithm has become overly complex and
difficult to maintain and evolve [52]. Constraint-based type inference proposes to solve these
problems by separating type inference into three distinct phases: constraint generation, solving,
and type reconstruction using a small independent first-order constraint language, making
inference algorithms and theoretical proofs of correctness more modular.
The idea behind constraint-based type inference is elegantly simple: for some arbitrary term M ,
we generate a constraint C such that if C is true, then M is well-typed. After solving C, we
construct M:τ , an explicitly-typed representation of the term M , during type reconstruction.

Constraint Constraint Type


Generation Solving Reconstruction

Figure 1.1: An overview of the constraint-based inference pipeline.

1
In this dissertation, we implement a type inference algorithm using a constraint-based approach
for a subset of OCaml, dubbed Dromedary, consisting of Core ML with type annotations, poly-
morphic variants, extensible variants, semi-explicit first-class polymorphism, type abbreviations,
structures, and GADTs.

1.2 Previous Work


Previous work to improve OCaml’s inference algorithm has concentrated on incremental im-
provements to the existing implementation [52]. In contrast, our work is more ambitious and
aims to provide the foundation for a complete rewrite – which we believe to be worthwhile.
Constraint-based inference for the ML type system has been extensively explored in Pierce’s
book [38]. Numerous extensions to OCaml’s type systems have been independently formalised in
a constraint-based setting [13, 18, 17]; however, we will provide the first unified work exploring
constraint-based inference for OCaml’s type system.
Many of the approaches described in [13, 18, 17] lack modularity due to the interleaving of
constraint solving and type reconstruction. Our approach differs in that it builds upon Pottier’s
modular constraint-based inference [39].

1.3 Project Summary


I demonstrate the following in my dissertation:
• I define Dromedary and its type system in a constraint-based setting in Section 3.1.

• I implemented a constraint-based inference algorithm for Dromedary, focusing on modu-


larity and efficiency (Section 3.2).

• I provide empirical evidence for the correctness of Dromedary’s inference algorithm on


a corpus of programs, and demonstrate that Dromedary is equally permissive to
OCaml1 (Section 4.2).

• In Section 4.3, I compare the performance of Dromedary’s inference algorithm to OCaml’s


(4.12.0), demonstrating that Dromedary is more performant than OCaml.

1
In the implemented extensions.

2
2 Preparation
In this chapter, we summarise the key background material for this project. In Section 2.1, we
provide a detailed account of the existing theory for constraint-based type systems. Following
that, we present a tutorial showcasing various features in OCaml that are implemented in
Dromedary. Section 2.3 details the requirements of Dromedary’s implementation and professional
practices followed throughout the project.

2.1 Type Systems


Type systems are defined as a set of axioms and inductive rules (collectively known as typing
rules) that constrain the form of an expression e according to its type τ – the expression
1 ^ " w o r l d " is invalid since an i n t cannot be concatenated with a s t r i n g.
In this section, we will explain the prerequisite theory and concepts required for formalising
Dromedary’s type system in Section 3.1, using the smaller and simpler ML calculus [33] for
pedagogical purposes.
2.1.1 The ML Type System
The ML calculus [33] is presented in its implicitly-typed form, with expressions given by
e ::= x | c | fun x → e | e e | let x = e in e ,
where x and c denote term variables and constants, respectively. The expression fun x → e is
an anonymous function that binds x in e, e1 e2 is function application, and let x = e1 in e2 is a
polymorphic let binding that binds e1 to x in e2 .
Types and type schemes (or polymorphic types), denoted τ and σ, are defined by the following
grammars:
τ ::= α | τ F
σ ::= ∀α.τ ,
where α is a type variable, F ::= · → · | . . . is a type former 1 (or type constructor ), and τ
represents a (possibly empty) vector of types of some finite length. Free variables fv(·) and
substitutions {τ /α}(·) on types and type schemes are defined as usual [33]. As a notational
shorthand we write α # τ for α ∈ / fv(τ ).
Typing judgements in the type system are of the form Γ ` e : τ , read as: the expression e has
the type τ under the typing context Γ. A typing context Γ (or typing environment) is a sequence
of bindings of term variables x to type schemes σ, representing the current scope; free variables
fv(·) and substitutions are applied element-wise over Γ.
Typing rules are defined inductively, written as:
Γ1 ` e1 : τ1 ··· Γn ` en : τn
(Rule-name)
Γ`e:τ
Which is read as: if the premises Γi ` ei : τi listed above the bar hold for all 1 ≤ i ≤ n, then the
conclusion Γ ` e : τ below the bar holds.
The ML typing rules are given in Figure 2.1, where ∆ is an implicit typing context for constants.
The two fundamental operations of the ML type system (implicit in the given presentation) are:
1
The application of the former · → · is written in infix notation: τ1 → τ2 .

3
• Generalisation: The process of converting a monomorphic type τ (or monotype) into a
type scheme σ, by binding the free variables of τ that are not present in Γ with a universal
quantifier ∀α.τ ; this implicitly occurs in the ML-let rule.
• Instantiation: The process of specialising a type scheme σ into a type τ , by substituting
the universally bound type variables α with types τ . This implicitly occurs in the ML-var
and ML-const rule.

x : ∀α.τ ∈ Γ c : ∀α.τ ∈ ∆
(ML-var) (ML-const)
Γ ` x : {τ /α} τ Γ ` c : {τ /α} τ

Γ ` e1 : τ1 → τ2 Γ ` e2 : τ1 Γ, x : τ1 ` e : τ2
(ML-app) (ML-fun)
Γ ` e1 e2 : τ2 Γ ` fun x → e : τ1 → τ2

Γ ` e1 : τ1 α = fv(τ1 ) \ fv(Γ) Γ, x : ∀α.τ1 ` e2 : τ2


(ML-let)
Γ ` let x = e1 in e2 : τ2

Figure 2.1: The syntax-directed ML typing rules.

The ML typing rules read as follows:


• ML-var: If x has the type scheme σ = ∀α.τ in Γ, then x can be judged to have the type of
an instantiation of σ – the monotype {τ /α}τ . ML-const is analogous to ML-var.
• ML-app: If e1 has the function type τ1 → τ2 , and the argument e2 has the parameter type
τ1 , then the application e1 e2 may be judged to have the return type τ2 .
• ML-fun: Assuming x has the type τ1 , allowing us to conclude that function body e has
the type τ2 , then we may deduce that the function fun x → e has the type τ1 → τ2
• ML-let: Supposing e1 has the type τ1 , and that we may generalise τ1 to the type scheme
∀α.τ1 , and that e2 has the type τ2 given that x has the type scheme ∀α.τ1 , then we may
conclude that the expression let x = e1 in e2 has the type τ2 .
Note that fun-bound variables in ML-fun are monotypes, thus polymorphism is only achieved
via the use of let – as demonstrated below.
(* This is ill-typed - since [f] is monomorphic *)
fun f -> (f (fun x -> x)) (f 2)

(* This is well-typed - since [f] is polymorphic *)


let f = ...
in (f (fun x -> x)) (f 2)

Listing 2.1: ML let-based polymorphism in action – fun-bound variables are monomorphic,


whereas let-bound variables are polymorphic.

Type inference is simply the process of finding a type τ for an expression e such that Γ ` e : τ .
Milner’s essential insight for efficient type inference is that the typing rules are syntax-directed ;
that is to say, at most, one typing judgement applies to an expression. Consequently, the shape
of the derivation tree for proving Γ ` e : τ is uniquely determined by the form of e.

4
So we can, in effect, run the typing rules “backwards” and “guess” types by introducing new type
variables α; α is then subject to certain constraints induced by subsequent typing rules. Thus,
type inference for ML decomposes into constraint generation followed by constraint solving.
Key Takeaway: ML provides parametric polymorphism with efficient decidable type inference
based on syntax-directed rules. Inference may be split up into constraint generation and constraint
solving phases.

2.1.2 Constraints
The interface between constraint generation and solving is the constraint language. The syntax
of constraints and constrained type schemes [38] is given by the grammar:
C ::= true | false | τ = τ | C ∧ C | ∃α.C
| def x : σ in C | x ≤ τ | σ ≤ τ ,

σ ::= ∀α.C ⇒ τ .
Constraints naturally model a subset of first-order logic equipped with equations between types,
consisting of conjunction and existential quantification.
Type inference for ML requires generalisation and instantiation. In order to permit these
operations, we require the last three constraint constructs and constrained type schemes. The
constraint def x : σ in C associates the (constrained) type scheme σ with x in the constraint C
(where x may appear as a free variable). The constraint x ≤ τ (and σ ≤ τ ) is an instantiation
constraint, read as: τ is an instance of x which holds if type τ is an instance of the scheme
σ associated with x. Constrained type schemes were introduced to avoid the interleaving of
constraint generation and solving since determining the type scheme ∀α.τ requires constraint
solving. We often write τ or ∀α.τ as syntactic sugar for ∀ · .true ⇒ τ and ∀α.true ⇒ τ ,
respectively.
Constraints are a formal logic: a syntax with a semantic interpretation. Semantically, constraints
are interpreted in the Herbrand universe, that is, the set of ground types:
t ::= t F .
A ground assignment ϕ is a partial function from type variables to ground types. Similarly,
an environment ρ is a partial function from term variables x to sets of ground types. The
interpretation of constraints is defined inductively in Figure 2.2, by the judgement ϕ; ρ C,
read as: in the environment ρ, the ground assignment ϕ satisfies C.

ϕ(τ1 ) = ϕ(τ2 ) ϕ; ρ C1 ϕ; ρ C2
ϕ; ρ true ϕ; ρ τ1 = τ2 ϕ; ρ C1 ∧ C2

ϕ, α 7→ t; ρ C ϕ(τ ) ∈ ρ(x) ϕ(τ ) ∈ (ϕ; ρ)(σ) ϕ; ρ, x 7→ (ϕ; ρ)(σ) C


ϕ; ρ ∃α.C ϕ; ρ x ≤ τ ϕ; ρ σ ≤ τ ϕ; ρ def x : σ in C

Figure 2.2: The inductive rules for semantic interpretation of constraints.

We interpret the constrained type scheme ∀α.C ⇒ τ under the assignment ϕ and environment
ρ as the set of ground types ϕ0 (τ ) if the assignments ϕ and ϕ0 are equal modulo α, denoted
ϕ =\α ϕ0 , and ϕ0 satisfies C:
(ϕ; ρ)(∀α.C ⇒ τ ) = ϕ0 (τ ) : ϕ =\α ϕ0 ∧ (ϕ0 ; ρ C) ,


5
where assignments ϕ and ϕ0 are said to be equal modulo α, if

∀β ∈ (dom(ϕ) ∩ dom(ϕ0 )) \ α. ϕ(β) = ϕ0 (β) .

A constraint C1 entails C2 , written C1  C2 , if every context that satisfies C1 also satisfies C2 .


Similarly, equivalence C1 ' C2 holds if the property is bidirectional.

Entailment C1  C2 ∀ϕ, ρ. ϕ; ρ C1 =⇒ ϕ; ρ C2 ,
Equivalence C1 ' C2 ∀ϕ, ρ. ϕ; ρ C1 ⇐⇒ ϕ; ρ C2 .

As suggested in Section 2.1.1, we can now reduce the problem of type inference in ML to
constraint solving by defining a mapping Je : τ K of candidate typings to constraints, given in
Figure 2.3.

Jx : τ K = x ≤ τ
Jc : τ K = ∆(c) ≤ τ
Jλx.e : τ K = ∃α1 α2 . τ = α1 → α2 ∧ def x : α1 in Je : α2 K if α1 , α2 # τ
Je1 e2 : τ K = ∃α. Je1 : α → τ K ∧ Je2 : αK if α # τ
Jlet x = e1 in e2 : τ K = (∃α.C) ∧ def x : ∀α.C ⇒ α in Je2 : τ K where C = [[e1 : α]] if α # τ

Figure 2.3: The constraint generation mapping for ML – Je : τ K is the constraint that holds if
and only if e has the type τ .

A problem with the definition of constraint generation in Figure 2.3 is that the constraint
C = Je1 : αK occurs twice in Jlet x = e1 in e2 : τ K, which can lead to exponential complexity.
Fortunately, we can avoid this by extending the constraint language with the following construct:

C ::= . . . | let x : σ in C,

semantically defined such that the following equivalence holds:

let x : ∀α.C1 ⇒ τ in C2 ' ∃α. C1 ∧ def x : ∀α.C1 ⇒ τ in C2 .

Linear constraint generation for [[let x = e1 in e2 : τ ]] is now given by

Jlet x = e1 in e2 : τ K = let x : (|e1 |) in Je2 : τ K


(|e|) = ∀α.Je : αK ⇒ α ,

where (|e|) is a principal constrainted type scheme for e: it is the most general set of all ground
types that e admits.
Key Takeaway: Constraints form the interface between constraint generation and solving. ML
constraint generation has linear complexity, making constraints suitable for an efficient and
modular implementation of type inference. Extensibility of constraints demonstrates the ability
to shift complexity between the constraint solving and generation phases.

2.1.3 Constraint-Based ML : PCB


In [38], Rémy and Pottier present an alternative formalisation of the ML type system, dubbed
PCB2 , whose distinctive feature is to exploit the constraints introduced in Section 2.1.2.

6
Cx≤τ C1 ` e 1 : τ 1 → τ 2 C2 ` e2 : τ1
(PCB-var) (PCB-app)
C`x:τ C1 ∧ C2 ` e1 e2 : τ2

C ` e : τ2
(PCB-fun)
def x : τ1 in C ` fun x → e : τ1 → τ2

C1 ` e 1 : τ 1 C2 ` e 2 : τ 2
(PCB-let)
let x : ∀fv(C1 , τ1 ).C1 ⇒ τ1 in C2 ` let x = e1 in e2 : τ2

C ` e : τ1 C`e:τ α#τ
(PCB-eq) (PCB-exists)
C ∧ τ1 = τ2 ` e : τ2 ∃α.C ` e : τ

Figure 2.4: The PCB typing rules.

Judgements take the form C ` e : τ , read as: under the satisfiable assumptions C, the expression
e has the type τ , where the constraint C may contain free type and free term variables. We
identify judgments modulo constraint equivalence of their assumptions, that is, C ` e : τ and
D ` e : τ are equivalent when C ' D holds. The type system is described in Figure 2.4.
The PCB-var rule states that x has the type τ under the assumption that τ is an instance of x.
Unlike ML, no typing context is consulted. The environment is implicit within the constraint;
thus, any free term (and type) variables in e also occur in C.
PCB-app is analogous to ML-app. PCB-fun requires the function body e to have the return
type τ2 , under C. In the rule’s conclusion, we wrap C in def x : τ1 in C, assuming parameter
x has type τ1 , permitting C to contain instantiation constraints of the form x ≤ τ . PCB-let is
similar to PCB-fun, however, uses a let constraint to assign x a constrained type scheme; the free
variables fv(C1 , τ1 ) in the quantifier ensure the scheme is closed. Both PCB-eq and PCB-exists
are non-syntax directed rules required for soundness.
We are able to provide a more simplified formalisation by using PCB as the basis of
Dromedary’s type system. The inclusion of structured constraints in the typing rules
benefits advanced features that rely on constraints, most notably GADTs (Section 3.1.6).
Additionally, metatheoretic properties such as soundness and completeness [38] of constraint-
based inference for PCB may be stated directly without relying on substitutions, resulting in
straightforward correctness proofs for Dromedary’s inference3 .
Key Takeaway: PCB is a purely constraint-based presentation of ML; its advantages include
a simpler and more intuitive formalisation and easier correctness proofs for constraint-based
inference, which Dromedary benefits from.

2.2 OCaml
OCaml is a general-purpose, high-level programming language combining functional, object-
oriented, and imperative paradigms – with one of the most sophisticated and powerful type
inference algorithms available – based on the ML calculus from Section 2.1.1.
In this section, we describe several OCaml design patterns utilised throughout our codebase,
as well as a selection of sophisticated type system features implemented in Dromedary.
2
A purely constraint-based type system.
3
In future work.

7
OCaml extends ML with a wide range of features, including: mutual recursion, algebraic data
types (ADTs), and patterns. Algebraic data types are defined using a combination of records
and variants (so-called product and sum types). For example, Listing 2.2 defines the ADTs
b i n _ o p, e x p r, and b i n d i n g.

type bin_op = Add | Sub

type expr =
| Int of int
(** Integer constant [1, -3, ...] *)
| Var of string
(** Variables [x, eval, ...] *)
| Let of { bindings: binding list; in_: expr }
(** Let bindings [let x1 = e1 and ... xn = en in e] *)
| Bin_op of { left: expr; op: bin_op; right: expr }
(** Infix binary operators [e1 + e2, e1 - e2] *)
and binding =
{ var: string
; exp: expr
}
Listing 2.2: The type definitions for a simple language using algebraic data types in OCaml –
e x p r and b i n _ o p are variant types and b i n d i n g is a record type.

Types, such as b i n _ o p, can be constructed using data constructors (sometimes referred to as


variants or tags) and deconstructed using pattern matching:
let eval_bin_op op n1 n2 =
match op with
| Add -> n1 + n2
| Sub -> n1 - n2

As of OCaml 3, function arguments may be labelled, distinguished using a tilde ~ – permitting


arguments to be specified by name instead of position (Listing 2.4). OCaml also supports
an imperative style of programming, allowing side-effecting primitives such as exceptions and
mutable references, as seen below in the imperative implementation of the factorial function:
let fact n =
if n < 0 then raise Invalid_argument;
let result = ref 1 in
for i = 1 to n do
result := i * !result
done;
!result

2.2.1 Modules and Functors


OCaml features a powerful module system [28] independent of the core language. Large OCaml
programs are divided into modules, with each module consisting of a set of types and values.
Modules are defined by a structure and a signature (Listing 2.3). The signature defines the
subset of types and values that can be used externally. Types may be specified concretely
(exposing the underlying constructors) or abstractly (hiding the underlying implementation).

8
Values are specified using their type (signature). A structure is the implementation of a signature
– implementing each type and value specified in the signature.
Components of a module are referred to through qualified identifiers (known as “dot notation”)
or using o p e n M o d u l e _ n a m e for unqualified access (Listing 2.3).
Modules may be parameterised by other modules using functors 4 , which are (informally)
functions from modules to modules. In Dromedary, functors are foundational for implementing
the modular constraints library.
open Core

module Env : sig


(** Signature for [Env]. *)

(** Abstract type [t] representing environment. *)


type t

(** [add t x n] adds the binding [(x, n)] to


environment [t]. *)
val add : t -> string -> int -> t

(** [find_exn t x] returns the bound value of variable [x]


in [t], raising [Not_found] if [x] is not in [t]. *)
exception Not_found
val find_exn : t -> string -> int
end = struct
type t = (string, int) List.Assoc.t

let add t x n =
List.Assoc.add t x n ~equal:String.equal

exception Not_found
let find_exn t x =
try List.Assoc.find_exn t x ~equal:String.equal
with _ -> raise Not_found
end
Listing 2.3: A snippet demonstrating OCaml’s module structures and signatures.

Key Takeaway: Modules encapsulate and structure large OCaml programs. Functors are
parameterised modules – used to decouple dependencies between modules, increasing modularity.
2.2.2 Functors, Applicatives and Monads
Contrary to their cryptic nomenclature, functors, applicatives, and monads are simply functional
programming design patterns analogous to object-oriented design patterns. They are based on
the concept of composing various operations (or effects) to conceal complexity.
Intuitively, a functor is a polymorphic data structure ' a t that ‘wraps’ values of type ' a,
with a function m a p which lifts a function f of type ' a - > ' b, to a function on ‘wrapped’
values ' a t - > ' b t (Listing 2.5). An applicative, or applicative functor, extends a functor
by providing: (a) a function r e t u r n that accepts any value and ‘wraps’ it; (b) an operation
b o t h, that takes two values t 1 , t 2 of types ' a t and ' b t, ‘unwraps’ both their values and
4
Not to be confused with functors (Section 2.2.2).

9
let rec eval exp ~env =
match exp with
| Int n -> n
| Var x ->
Env.find_exn env x
| Let { bindings; in_ } ->
let env = bind ~env bindings in
eval ~env in_
| Bin_op { left; op; right } ->
let n1 = eval ~env left
and n2 = eval ~env right in
eval_bin_op op n1 n2
and bind bindings ~env =
(* Iterates over [bindings], adding each to [env] using
[List.fold_right], returning the extended environment. *)
List.fold_right bindings
~init:env
~f:(fun { var; exp } env ->
Env.add env var (eval ~env exp))
Listing 2.4: An interpreter for the simple language from Listing 2.2. The implementation of
e v a l uses mutual recursion and labelled arguments.
‘re-wraps’ them into a pair, yielding a value of type ( ' a * ' b ) t – allowing independent
operations to be sequenced. Monads extend applicative functors further, adding an operation
b i n d, which permits the sequencing of dependent operations. Each structure and its operations
must satisfy various laws known as the functor, applicative, and monad laws [55, 31] – which we
omit.
Applicatives are a fundamental abstraction in our constraints library (Section 3.2.2). Additionally,
we use monads extensively in our codebase as a generic design pattern for encapsulating side-
effects, such as explicitly propagating failure using the R e s u l t monad.
Key Takeaway: Functors, applicatives and monads are functional programming design patterns
(similar to OOP design patterns) that are used to hide complexity by providing the ability to
compose operations (or effects) on ‘wrapped’ values.
module type Functor = sig
type 'a t
val map : 'a t -> f:('a -> 'b) -> 'b t
end

module type Applicative = sig


include Functor
val return : 'a -> 'a t
val both : 'a t -> 'b t -> ('a * 'b) t
end

module type Monad = sig


include Applicative
val bind : 'a t -> f:('a -> 'b t) -> 'b t
end
Listing 2.5: The signatures for functors, appliactives, and monads.

10
2.2.3 Generalised Algebraic Data Types
Generalised algebraic data types (GADTs), introduced by Xi et al. [56], allow one to describe
richer constraints between constructors and their types. The canonical example of GADTs is a
typed domain-specific language (DSL):
(** ADT encoding of [expr] *)
type expr =
| Int of int
| Pair of expr * expr
| Fst of expr
| Snd of expr

(** GADT encoding of [expr] where ['t expr] is [expr]


of type ['t]. *)
type _ expr =
| Int : int -> int expr
| Pair : 'a expr * 'b expr -> ('a * 'b) expr
| Fst : ('a * 'b) expr -> 'a expr
| Snd : ('a * 'b) expr -> 'b expr

Listing 2.6: The type definition of a simple DSL in OCaml using ADTs and GADTs.

The formal details of the GADT definition are explained in Section 3.1.6. The important point
is that it allows us to a give a more precise type for each constructor:
• I n t n has the type i n t in the DSL, thus its type is i n t e x p r.

• The constructor P a i r produces a pair from two expressions of types ' a , ' b, thus its
type is ( ' a * ' b ) e x p r.

• The constructor F s t projects the first element from a pair ' a * ' b, thus its type is
' a e x p r.
Thus, with the GADT encoding, expressions such as F s t ( I n t 1 ) in OCaml are ill-typed,
avoiding an error-prone programming style. Other compelling applications of GADTs will be
discussed throughout this dissertation (Sections 2.2.5, 3.2.2).
Problems with Inference One of the characteristic features of the ML type system is its
ability to infer the principal (or most general ) type for any well-typed expression.
Among the primary difficulties associated with inference in the presence of GADTs is the loss of
principality. Sulzmann et al. [46] demonstrated that programs with GADTs frequently have
more than one principal type. To illustrate this, we consider the following example:
type (_, _) eq = Refl : ('a, 'a) eq

let coerce eq x = match eq with Refl -> x


Listing 2.7: The definition of the equality GADT in OCaml – the type ( ' a , ' b ) e q encodes
a “proof” that ' a is equal to ' b.
Informally, a value of type ( ' a , ' b ) e q is a “proof” of equality between the type ' a and
' b. Thus, when we pattern match on a term e q with type ( ' a , ' b ) e q, we obtain the
constraint ' a = ' b. Considering c o e r c e in Listing 2.7, if e q has the type ( ' a , ' b ) e q

11
and x has the type ' a, then it may also have the type ' b. So one may deduce that c o e r c e
has the type ( ' a , ' b ) e q - > ' a - > ' b.
However, there are in fact three principal types for c o e r c e:
• ('a, 'b) eq -> 'c -> 'c
• ('a, 'b) eq -> 'a -> 'b
• ('a, 'b) eq -> 'b -> 'a
This poses various problems. To begin, principality is a central property for efficient type
inference since it allows us to make locally optimal decisions. Second, should a program have
more than one principal type, which should we infer? To circumvent this, we often restrict the
type system or rely on explicit annotations.
As described above, deconstructing GADTs using pattern matching introduces local typing
constraints. However, these constraints may result in differing branch types in a m a t c h
expression. Reconciling these types is difficult.
Key Takeaway: GADTs allow richer constraints between data constructors and their types.
However, inference is notoriously difficult, suffering from a loss of principality, irreconcilable
branch types and reliance on polymorphic recursion (Section 2.2.4).

2.2.4 Polymorphic Recursion


In OCaml, unannotated recursive functions are monomorphically recursive, which refers to a
recursive parametric polymorphic function where the type parameters are monomorphic 5 for
each recursive occurrence. For example, we consider the following algebraic data type for a
perfect binary tree [36]:
type 'a perfect_tree =
| Leaf of 'a
| Node of 'a * ('a * 'a) perfect_tree
Listing 2.8: A type definition for perfect trees in OCaml, taken from [36].

We now consider writing a function l e n g t h : ' a p e r f e c t _ t r e e - > i n t to determine


the number of nodes in a perfect tree. If we were to write l e n g t h naively, we would receive
a type error: this is due to the monomorphically recursive occurrence of l e n g t h with the
type ' _ a p e r f e c t _ t r e e - > i n t, which cannot be applied to t which has the type
( ' _ a * ' _ a ) p e r f e c t _ t r e e.
let rec length t =
Error: This expression has type
match t with
('_a * '_a) perfect_tree
| Leaf _ -> 1
but an expression was expected of type
| Node (_, t) ->
'_a perfect_tree
1 + 2 * (length t)

One solution is to make l e n g t h polymorphically recursive, where each recursive occurrence is a


(non-trivial) instantiation of a polymorphic type scheme. Regrettably, inference for polymorphic
recursion is undecidable [25, 20]. The Milner-Mycroft calculus [35] addresses this by using
programmer-supplied type annotations to provide decidable type inference. See Figure 2.5 and
Listing 2.9 for details.
5
' _ a is used to denote a weak or monomorphic type variable in OCaml.

12
Key Takeaway: Polymorphic recursion refers to recursive functions where each recursive
occurrence is a non-trivial instantiation of a type scheme. Notable type system features that
rely on polymorphic recursion include GADTs, region-based memory management [54], and
binding-time analysis [10].

e ::= . . . | let rec x : ∀α.τ = e in e

Γ, x : ∀α.τ1 ` e1 : τ1 α = fv(τ1 ) \ fv(Γ) Γ, x : ∀α.τ1 ` e2 : τ2


(ML-poly-rec)
Γ ` let rec x : ∀α.τ1 = e1 in e2 : τ2

Figure 2.5: The ML typing rules for polymorphic recursion from the Milner-Mycroft calculus
[35].

let rec length : type a. a perfect_tree -> int =


fun t -> match t with
| Leaf _ -> 1
| Node (_, t) -> 1 + 2 * (length t)
Listing 2.9: An example of polymorphic recursion in OCaml – requiring an explicit polymorphic
annotation for decidable type inference.

2.2.5 Semi-Explicit First-Class Polymorphism


Numerous characteristics contribute to OCaml’s success. Undoubtedly, ML-style polymorphism,
with its decidable type inference, is a substantial benefit. However, there are some cases where
one would like to have first-class polymorphism, as in System F.
To demonstrate this use case, we consider a dependent associative list. Informally, a dependent
associative list is a list of key-value pairs where the type of the value depends on the key. In
OCaml, such a data structure may be encoded using GADTs:
(** Type for keys in associative list. The parameter ['a] is
the type of values that will be associated to the key. *)
type _ key =
| Int : int -> string key
(** Integers are mapped to strings *)
| Float : float -> bool key
(** Floats are mapped to bools *)

(** Type for elements in the associative list. Each


consisting of a key-value pair with the correct
types -- an ['value key] with a value ['value]. *)
type elem = Elem : 'value key * 'value -> elem

type t = elem list

Listing 2.10: The type definition of dependent associative list in OCaml using GADTs.

Now suppose we wish to write a function m a p that applies a function f to each ' a value for
each ' a k e y. For instance, we could write:

13
let map_elem (Elem (key, val_)) ~f =
Elem (key, f key val_)

let map t ~f = List.map t ~f:(map_elem ~f)

However, this is ill-typed in OCaml. As with polymorphic recursion, this is because f occurs
monomorphically in m a p _ e l e m. Since f must be instantiated with an arbitrary key type ' a,
m a p _ e l e m’s correct type in System F would be

map elem : elem → f : (∀α.α key → α → α) → elem .

Unfortunately, OCaml does not support this form of higher-rank polymorphism, and its inference
is undecidable. In OCaml, we use semi-explicit first-class polymorphism [18] in record types to
introduce these universally quantified types:
type elem_mapper = { f : 'a. 'a key -> 'a -> 'a }

let map_elem (Elem (key, val_)) ~f:mapper =


Elem (key, mapper.f key val_)
Listing 2.11: A demonstration of semi-explicit first-class polymorphism in OCaml, encoding the
polymorphic type ∀α.α key → α → α in the e l e m _ m a p p e r type.

Key Takeaway: Semi-explicit first-class polymorphism provides the ability to express higher-
rank polymorphism, as in System F, using polymorphic records. As a result, we can express
more programs that would otherwise be ill-typed in the Damas-Milner “sweet spot”.

2.2.6 Polymorphic Variants


Polymorphic variants are similar to variant types defined by algebraic data types. Syntactically,
polymorphic variants are distinguished from variants with a leading backtick:

let nan = `Not_a_number

The primary characteristic that differentiates polymorphic variants from variant types is
their ability to be utilised without an explicit type declaration 6 . The polymorphic variant
` N o t _ a _ n u m b e r, for example, has the inferred type [ > ` N o t _ a _ n u m b e r ]. The > sym-
bol at the beginning of a polymorphic variant type indicates that the type is a lower bound.
We can interpret [ > ` N o t _ a _ n u m b e r ] as a variant that at least contains the constructor
` N o t _ a _ n u m b e r. Similarly, polymorphic variants may also have an upper bound. For instance:

let is_a_number t =
match t with
| `Not_a_number -> false
| `Int _ -> true

has the inferred type: [ < ` I n t o f i n t | ` N o t _ a _ n u m b e r ] - > b o o l, interpreted as


a variant containing constructors ` I n t or ` N o t _ a _ n u m b e r, or some subset. These upper
and lower bounds form a subtyping relation on polymorphic variants, giving them their increased
expressiveness compared to ordinary variants. Polymorphic variants may also have no subtyping,
as shown below:
6
Sometimes referred to as anonymous variants.

14
type number = [ `Int of int | `Float of float | `Not_a_number ]

One of the pragmatic uses of polymorphic variants is extensible error types. For example, in
Listing 2.12, the s o l v e r _ e r r o r type extends the u n i f i e r _ e r r o r type.
While polymorphic variants appear to be a superset of ordinary variants, their inference is far
more complex – potentially resulting in cyclic types. Furthermore, due to their more expressive
typing rules, they are less likely to catch type-level bugs7 .
type unifier_error =
[ `Cyclic_type of Type.t
| `Cannot_unify of Type.t * Type.t
]

type solver_error =
[ unifier_error | `Unbound_variable of string ]
Listing 2.12: Extensible error types using polymorphic variants in OCaml, taken from the
project implementation.

Key Takeaway: Polymorphic variants enhance the flexibility and modularity of ordinary
variants by leveraging structural polymorphism (subtyping). However, their inference is far more
complex, and they incur a runtime performance cost.

2.3 Requirements Analysis


The analysis of requirements of the core project is given in the Success Critera of the Project
Proposal (Appendix E). The primary goal of the project was to investigate a constraint-based
approach to type inference for a subset of OCaml with GADTs.
Achieving this goal requires three components. First, the design of a minimal subset of OCaml
with GADTs, dubbed Dromedary. The second is the design and implementation of a modular
constraints library. Finally, the implementation of constraint generation and type reconstruction
using the aforementioned constraints library.
After completing the primary deliverable, many additional language features were added as
stretch requirements. I performed a MoSCoW8 analysis of these features, shown in Table 2.1.
Features were ranked by their relevance (records are more relevant than modules), implementa-
tion complexity (side-effecting primitives are easier to implement than semi-explicit first-class
polymorphism), and feasibility within the given time frame. While these extensions are not
essential for the project’s success, their implementation demonstrates the suitability of a modular
constraint-based approach for a larger subset of OCaml.
Following this analysis, I intended to implement all “should-have” and the following “could-have”
extensions: polymorphic variants, semi-explicit first-class polymorphism, and side-effecting
primitives. I had not planned to implement a lexer and a parser, but they proved crucial
for effective unit testing. Additional unanticipated extensions included: type abbreviations,
extensible variants, and structures; although implementing them required extra time, their
inclusion led to improved abstractions across our codebase.
7
In practice.
8
Must have, Should have, Could have, Won’t have.

15
MoSCow Priority Feature

Must Have ML type system; GADTs; Constraint-based inference

Records; Type annotations; Mutually recursive let-


Should Have
bindings

Side-effecting primitives (references, exceptions); Poly-


Could Have morphic variants; Semi-explicit first-class polymorphism;
Type abbreviations; Lexer and Parser; Extensible Variants

Won’t Have Objects; Modules

Table 2.1: A MoSCoW [5] analysis of features to include in the project.

2.3.1 Model of Software Development


The project was divided into various milestones, such as independent modules and language
features that could be added iteratively to the primary deliverable. This design lends itself to
the waterfall model of software development [45], where each new feature undergoes a complete
waterfall development cycle – design, implement, integrate, and test.
Milestones were based on the MoSCoW analysis, prioritising the primary deliverable requirements,
subsequently focusing on “should have” and “could have” features. Milestones were divided
into 24 tasks, with each task’s development tracked using a Kanban board [19] on GitHub.

2.3.2 Tools Used


I chose OCaml version 4.12.0 as the implementation language since many phases of constraint-
based inference involve traversing of trees of some form, and hence features such as pattern
matching and algebraic data types were crucial. Additionally, side-effecting operations enable
efficient implementations, the powerful module system facilitates the creation of robust modular
interfaces, and the preprocessor extension (PPX) ecosystem supports the embedding of domain-
specific languages such as a constraint language.
I used Dune [48] as the build system, integrating it with OPAM [53], OCaml’s package manager.
Packages are installed under a local switch (local environment) to avoid clashes with existing
OCaml installations on the same machine. The OCamlFormat tool ensures our codebase uses a
consistent style, promoting readability. Makefile targets are provided as convenient command
aliases for installing, building, and testing the project from scratch.
Git was used for version control, with remotes backed up on GitHub. Development of milestones
took place in branches, creating a total of 23 branches. Several branches were added for
the experimental design of features and implementations. Once I decided on a particular
design/implementation, I completed its waterfall cycle, merging its branch into main. At the
time of writing, the main branch has 242 commits.
A critical stage of each waterfall development cycle is testing. For property-based unit testing, I
used Alcotest [50] and QCheck [8]. The remaining integration and unit tests in the suite were
written using Jane Street’s expect tests library; each test executes a fragment of OCaml code,
captures the result, and compares it to the correct output through textual diff. Rather than
writing the expected output manually, I inspected the captured output of the test case, either

16
accepting it as correct or rectifying a fault. This was particularly advantageous when writing
tests involving large syntax trees generated by our inference algorithm. In total, I wrote 514
tests.
I built Continuous Integration workflows that automate the execution of regression tests, coverage
checks and formatters for each commit. Coverage was tracked using Coveralls.io [7] and
Bisect [2], achieving a 75% test coverage9 .
I used LexiFi’s landmarks [49] library for profiling, allowing me to optimise Dromedary’s
constraint solver. In the evaluation, I used Core Bench [3] to micro-benchmark Dromedary and
OCaml: running the type checkers multiple times, finding mean runtime and memory usage
with accuracy bounds.

2.3.3 License
This project is intended as a proof-of-concept and foundation for implementing constraint-based
type inference for OCaml. Therefore, the code was made publicly available on GitHub under
the MIT licence [22] – permitting any person to use, copy, modify, and distribute the software.

2.4 Starting Point


I’m familiar with types, having studied Semantics of Programming Languages. I have no previous
experience in type inference beyond ML’s classical inference algorithms [33]. I have a basic
understanding of constraint solving as a result of studying Prolog and Logic and Proof.
Before starting, I conducted research on OCaml’s type system to investigate the project’s
feasibility. From Foundations of Computer Science and extra-circular study, I have practical
experience writing OCaml programs. I have previously worked with the OCaml compiler
codebase.

2.5 Summary
We introduced the ML calculus and its constraint-based counterpart PCB in Section 2.1, which
will serve as the foundation of Dromedary’s type system. We discussed various advanced type
system features in OCaml that we implement in Dromedary and some design patterns that are
prevalent in our implementation. We also defended decisions on features, tools, and dependencies
used and our overall software engineering methodology.

9
Many unchecked lines included interface and type definitions, resulting in a lower percentage of checked
code.

17
3 Implementation
The implementation of Dromedary is described in two sections. The first (Section 3.1) covers
the design of Dromedary, along with the description of its type system. The second (Section
3.2) explores the practical implementation of Dromedary’s inference, with a particular emphasis
on design decisions that result in efficient and modular constraint-based inference.

3.1 Dromedary
Dromedary is a subset of the OCaml language, supporting all of Core ML: ML polymorphism,
algebraic data types, type annotations, and side-effecting primitives. Additionally, Dromedary
includes many of OCaml’s advanced type system features, namely type abbreviations,
extensible variants, abstract types, polymorphic recursion, semi-explicit first-class polymorphism,
GADTs, and polymorphic variants.
The untyped syntax of Dromedary is given using BNF in Appendix A. We have designed a type
system for Dromedary, based on PCB’s type system presented in Section 2.1.3. It is the the
first unified formalisation of (a substantial subset of) OCaml’s type system in a constraint-based
setting.
This section discusses a selection of the aforementioned language features and their formalisation
– referencing selected typing rules. The majority of features are orthogonal and will be discussed
as independent extensions to the ML type system. For the mathematically inclined reader,
Appendix C provides a complete formalisation of the type system.

3.1.1 Algebraic Data Types


Algebraic data types (ADT) are defined as an abstract type α F, with an isomorphism to some
finite sum or product type – which may be recursive. Let K and ` range over data constructors
and record labels. Formally, an algebraic data type definition is given by either a variant type or
a record type, whose respective (closed) forms are
n n
type α F ∼ type α F ∼
X Y
= Ki [of τi ] and = `i : τi ,
i=1 i=1

where [S] denotes that the syntactic element S is optional. A structural environment Ψ consists
of a sequence of typing definitions. The (closed) type scheme assigned to K and ` in Ψ, which
one may derive from the type definition of F, are written as:
Ψ ` K : ∀α.[τ →] α F ,
Ψ ` ` : ∀α.τ → α F .
For example, the algebraic data type α perfect tree (Listing 2.8) has the following constructors:
Leaf : ∀α.α → α perfect tree ,
Node : ∀α.α × (α × α) perfect tree → α perfect tree .

We extend the constraint language with instantiation constraints for data constructors K ≤ τ
and labels ` ≤ τ , semantically defined such that the following equivalences hold:
K ≤ [τ1 →] τ2 ' ∃α. [τ1 = τ ∧] τ2 = α F if Ψ ` K : ∀α.[τ →] α F
` ≤ τ1 → τ2 ' ∃α. τ1 = τ ∧ τ2 = α F if Ψ ` ` : ∀α.τ → α F ,

18
where Ψ is implicit.
To support binding multiple variables at once for patterns, we need to introduce the notion of
constrained contexts and fragments. A fragment ∆ is a mapping between term variables and
their types: ∆ ::= · | ∆, x : τ . Fragments intuitively reflect the typing context introduced when
a value is successfully matched against a pattern.
A constrained context Γ ::= ∀α.C ⇒ ∆ specifies a mapping from term variables x to (constrained)
type schemes ∀α.C ⇒ ∆(x). These are required for the efficient (linear) generation of constraints
for let-bound pattern matching. We write ∆ for the context of the form ∀ · .true ⇒ ∆. The
constraint language is suitably extended with constrained contexts:

C ::= . . . | def Γ in C | let Γ in C .

These multi-variadic bindings are semantically equivalent to nested def and let constraints:

def ∀α.C1 ⇒ xi : τi in C2 ' def x1 : ∀α.C1 ⇒ τ1 in . . . def xn : ∀α.C1 ⇒ τn in C2 ,


let ∀α.C1 ⇒ xi : τi in C2 ' let x1 : ∀α.C1 ⇒ τ1 in . . . let xn : ∀α.C1 ⇒ τn in C2 .

The typing judgements for algebraic data types feature three judgements corresponding to
patterns, cases, and expressions. Judgements for patterns and cases are of the form: C ` p : τ
∆ and C ` p → e : τ1 ⇒ τ2 ; interpreted as: under the satisfiable assumptions C, the pattern p
has the type τ , binding variables in fragment ∆ and under the satisfiable assumptions C, the
case p → e matches values of type τ1 returning values of type τ2 , respectively.
The typing rules in Figure 3.1 may be read as follows:
• Dromedary-pat-var: If the pattern x matches a value of type τ , then it binds x with type
τ in the fragment.

• Dromedary-pat-construct: We check the constructor K instantiates to τ1 → τ2 , and the


pattern p has the type τ1 , binding ∆, concluding that the pattern K p has the type τ2 ,
binding ∆.

• Dromedary-case: We require the pattern p to have the type τ1 , binding ∆ assuming C1


holds. Then, we check whether the body of the case e has the return type τ2 , under
assumptions C2 – which is permitted to contain assumptions about the bound variables ∆
from the pattern p due to the def constraint in the conclusion.

(Dromedary-pat-var)
C`x:τ x:τ

C  K ≤ τ1 → τ2 C ` p : τ1 ∆
(Dromedary-pat-construct)
C ` K p : τ2 ∆

C1 ` p : τ1 ∆ C2 ` e : τ 2
(Dromedary-case)
C1 ∧ def ∆ in C2 ` p → e : τ1 ⇒ τ2

Cp ` p : τ 1 ∆ C1 ` e1 : τ1 C2 ` e2 : τ2
(Dromary-exp-let)
let ∀fv(Cp , C1 , ∆).Cp ∧ C1 ⇒ ∆ in C2 ` let p = e1 in e2 : τ2

Figure 3.1: A selection of Dromedary’s typing rules related to algebraic data types.

19
• Dromedary-exp-let: This is analogous to the PCB-let rule (Section 2.1.3). The novelty is
the use of a constrained context and pattern judgement to determine the bound variables.

3.1.2 Annotations and Polymorphic Recursion


Dromedary allows programmers to annotate their expressions with a type:

e ::= . . . | (e : τ ) .

The typing rule for expressions of this form is fairly obvious:

C`e:τ
(Dromedary-exp-constraint)
C ` (e : τ ) : τ

However, unlike OCaml, type variables α are not implicitly bound in Dromedary; to be used in
an annotation, the type variables must be introduced. This design choice was chosen to ensure
a more uniform and principled approach to annotations:

e ::= . . . | forall (type α) → e | exists (type α) → e .

(* flexible variables *) (* flexible variables *)


let succ (x : 'a) : 'a = let succ = exists (type 'a) ->
x + 1 fun (x : 'a) : 'a -> x + 1

(* rigid variables *) (* rigid variables *)


let id (type a) (x : a) : a = let id = forall (type 'a) ->
x fun (x : 'a) : 'a -> x
Listing 3.1: Examples of annotations in OCaml (on the left) and Dromedary (on the right),
illustrating the differences in the introduction of bounded type variables in expressions.

Type variables are either bound existentially (flexibly) or universally (rigidly). If α is existentially
bound, then the expressions (fun x → x + 1 : α → α) and (fun x → x : α → α) are well-
typed, whereas if α was universally bound, only (fun x → x : α → α) is well-typed since
(fun x → x + 1 : α → α) is only well-typed for α = int.
The typing rule for the existential form is straightforward, simply binding the variables α using
an existential quantifier in the conclusion:

C`e:τ α#τ
(Dromedary-exp-exists)
∃α.C ` exists (type α) → e : τ

The universal case is more difficult. In order to check that forall (type α) → e has the type τ , we
must check that e has the type τ for all instances of α. To express this, we introduce universal
quantification into the constraints language:

C ::= . . . | ∀α.C .

It is semantically defined as:

∀t. ϕ, α 7→ t; ρ C
ϕ; ρ ∀α.C

20
While universal quantification is sufficient for typing the forall construct, to permit linear
complexity for type checking (and constraint generation) we extend constrained contexts (Section
3.1.1) with universally quantified variables α:
Γ ::= ∀α, β.C ⇒ ∆ ,
where let Γ in C is semantically defined by equivalence:
let ∀α, β.C1 ⇒ ∆ in C2 ' ∀α.∃β. C1 ∧ def ∀α, β.C1 ⇒ ∆ in C2 .

The (linear) typing rule is given by:


C1 ` e : τ 1 C2  x ≤ τ 2
(Dromedary-exp-forall)
let ∀α, fv(τ1 ).C1 ⇒ x : τ1 in C2 ` forall (type α) → e : τ2
The reader may note the instantiation x ≤ τ2 in Dromedary-exp-forall; this is required as
forall (type α) → e is not necessarily polymorphic, it may implicitly be instantiated to a
monotype. To illustrate this, in OCaml, the following is well-typed:

let id_int (n : int) =


(fun (type a) (x : a) : a -> x) n

as the expression f u n ( t y p e a ) ( x : a ) : a - > x is instantiated to i n t - > i n t.


With annotations formalised, we can now attack the problem of polymorphic recursion (Section
2.2.4). We begin by discussing recursion in the constraints language, extending it with recursive
def and let forms:
C ::= . . . | def rec x : π in C | let rec x : π in C ,
where π denotes a recursive binding. For recursion in Dromedary, we require two kinds of
recursive bindings1 :
π ::= ∀α.C ⇒ τ | ∀α.C ⇐ τ .
Intuitively, the checking binding ∀α.C ⇐ τ asserts that the binding has the type scheme ∀α.τ in
the recursive constraint C, whereas the synthesising (or inferring) binding ∀α.C ⇒ τ assumes
that the binding has type τ in the recursive constraint C when synthesising the binding’s type
scheme.
Semantically, the interpretation (the set of ground types) of these bindings is defined as:
(ϕ; ρ) (x : ∀α.C ⇒ τ ) = ϕ0 (τ ) : ϕ =\α ϕ0 ∧ (ϕ0 ; ρ, x 7→ ϕ0 (τ ) C) ,


(ϕ; ρ) (x : ∀α.C ⇐ τ ) = ϕ0 (τ ) : ϕ =\α ϕ0 ∧ (ϕ0 ; ρ, x 7→ ϕ0 (∀α.τ )



C) .
The semantics for recursive def and let constraints are analogues to their non-recursive counter-
parts, using the above interpretations for π-bindings.
Our constraints give rise to the following typing rules for monomorphic and polymorphic
recursion, using the inferred and checked bindings, respectively:
C1 ` e1 : τ1 C2 ` e2 : τ2
(Dromedary-exp-rec-mono)
let rec x : ∀fv(C1 , τ1 ).C1 ⇒ τ1 in C2 ` let rec x = e1 in e2 : τ2

C1 ` e1 : τ1 C2 ` e2 : τ2
(Dromedary-exp-rec-poly)
let rec x : ∀α.C1 ⇐ τ1 in C2 ` let rec x : ∀α.τ1 = e1 in e2 : τ2
1
Notation inspired by bidirectional type checking [9].

21
3.1.3 Semi-explicit First-class Polymorphism
Recall that OCaml (Section 2.2.5) permits programmers to specify first-class polymorphism
explicitly using records with polymorphic fields, where creating a record {` = e} introduces these
polymorphic values wrapped in the record, and record field access e.` eliminates a polymorphic
value by instantiating it.
In Dromedary, we extend our formalisation of algebraic data types (Section 3.1.1), adding the
polymorphic fields required to express semi-explicit first-class polymorphic types:
n
type α F ∼
Y
= `i : ∀βi .τi ,
i=1

where the type scheme for the label ` in context Ψ is written as:
Ψ ` ` : ∀α.(∀β.τ ) → α F .
To ensure that the record field ` = e is well-typed, where ` : ∀α.(∀β.τ ) → α F, we verify that e
has the type τ for some instance of α and all instances of β – as β must be generic. Similarly,
to determine if e.` is well-typed, we check that e has the type α F for some instance of α and
that e.` has the type τ for some instance of β.
This reasoning may be expressed using the existential and universal quantification constraints
introduced in Sections 2.1.2 and 3.1.2, respectively, resulting in the label instantiation constraints
` ≤ σ → τ and ` ≤ τ , semantically defined as:
` ≤ τ1 → τ2 ' ∃α, β. τ = τ1 ∧ α F = τ2 if Ψ ` ` : ∀α.(∀β.τ ) → α F ,
` ≤ (∀β.C ⇒ τ1 ) → τ2 ' ∃α. α F = τ2 ∧ ∀β. τ1 = τ ∧ C if Ψ ` ` : ∀α.(∀β.τ ) → α F .

As a result of these constraints, the typing rules for semi-explicit first-class polymorphism are
as follows:
C  ` ≤ τ1 → τ2 C ` e : τ2
(Dromedary-exp-field)
C ` e.` : τ1

C ` e : τ1
(Dromedary-exp-record)
` ≤ (∀β.C ⇒ τ1 ) → τ2 ` ` = e : τ2

3.1.4 Sharing
Sharing is the process of removing repeated types and variables; it is a critical technique for
efficient type inference in ML. It also plays a role in more sophisticated type systems such as
ambivalent types (Section 3.1.6) and MLF [44].
In practice, types are shared by representing them as a directed acyclic graphs rather than
trees. To illustrate this, the type ( ' a - > ' a ) - > ' a - > ' a is represented by the graphs
depicted in Figure 3.2. The deduplication of repeated types is key to representing exponentially
sized types using a linear graphical representation, as demonstrated in our evaluation (Section
4.3).
A formal treatment of sharing requires the concept of a shallow type ψ. In a graph-based
description of types (as illustrated below), they are the structure of the internal nodes:
ψ ::= α F ,
where type variables represent pointers. By explicitly specifying variables (pointers), types are
not duplicated. The formal details of graphical types and converting between (deep) types τ
and shallow types are given in Appendices B, C. In type systems (such as Dromedary’s), only
the sharing of variables is significant: the sharing of internal nodes is not – we explain this
further in Section 3.1.6. 22
-> ->

-> -> ->

'a 'a 'a 'a 'a

Figure 3.2: The tree-based (left) and graphical (right) representations of the type
( ' a - > ' a ) - > ' a - > ' a.

3.1.5 Polymorphic Variants


Intuitively, the type of a polymorphic variant consists of a sequence of labelled types. We call
such a sequence a row. We implement a multi-sorted algebra for rows à la Rémy [42], where
types τ are extended with rows, as shown below:

τ ::= . . . | ` : τ :: τ | ∂τ .

∂τ is an infinite row, whose type is τ for every label, ` : τ :: r is the row consisting of row r
except the type for label ` is τ . Type variables used within the context of a row are called row
variables, denoted ρ.
Labels within a row are annotated with presence information; a label is either a b s e n t or
p r e s e n t with type τ , encoded using the unary type former τ p r e s e n t and nullary former
a b s e n t.
Variants To encode variants, we use the unary type former Σ, where Σ r denotes the type of
a polymorphic variant with row r. OCaml and Dromedary syntactically hide the rows and row
variables in polymorphic variants, simplifying the types exposed to the programmer.
However, this requires encoding our empty, open and closed polymorphic variants into row-based
representation before type inference:

[> ‘K [ of τ ]] ' Σ (K1 : [τ1 ] :: . . . :: Kn : [τn ] :: ρ) where ρ is free


[< ‘K [ of τ ]] ' Σ (K1 : [τ1 ] :: . . . :: Kn : [τn ] :: ∂ absent)
[ ] ' Σ (∂ absent) ,

where [τ ] is unit present in the optional case, and τ present otherwise.


Equi-recursive Types Polymorphic variants require equi-recursive types to represent recur-
sive variants such as ' l i s t in the example below:
let rec length t =
match t with
| `Nil -> 0
| `Cons (_, t) -> 1 + length t

has the type [ < ` N i l | ` C o n s o f ' a * ' l i s t ] a s ' l i s t - > i n t.


Formally, a recursive type has the form µα.τ where α may appear in τ and represents a recursive
occurrence of the type. With equi-recursive types, the recursive type µα.τ and its expansion (or
unfolding) {µα.τ /α}τ are equal ; that is, the two types denote the same type. This allows one

23
to fold and unfold an equi-recursive type infinitely, making comparisons between equi-recursive
types more difficult. Fortunately, by utilising sharing (Section 3.1.4), we may represent equi-
recursive types using directed cyclic graphs. In practice, this is implemented by removing the
occurs-check in unification (Section 3.2.2).
We extend Dromedary’s types τ with aliases and recursive ->
forms to encode the a s construct, since formalising a s
directly is challenging:
[< Nil | Cons of _ ] int
τ ::= . . . | τ where α = τ | µα.τ ,

where τ1 where α = τ2 defines the alias α = τ2 in τ1 . We


*
may encode the type for l e n g t h as

γ → int where γ = µβ.[< ‘Nil | ‘Cons of α × β] .


'a

The cyclic graph on the right depicts this type.


Typing Rules Expressions and patterns are extended with variants:

e ::= . . . | ‘K [e] , p ::= . . . | ‘K [p] .

We introduce subtyping constraints of the form τ ≤ ‘K [of τ ] and τ ≥ ‘K [of τ ] to reason about
the lower (<) and upper (>) bounds of polymorphic variants (Section 2.2.6) using constraints.
Constructing a variant ‘K e (the nullary case being analogous) has a fairly elementary rule:
C  τ2 ≥ ‘K of τ1 C ` e : τ1
(Dromedary-exp-variant)
C ` ‘K e : τ2

The typing rules for match expressions and cases, given in Figure 3.3, are more involved, with
several edge-cases. Dromedary-variant-match-closed implements closed pattern matching, where
every variant constructor is explicitly handled in a case. For open pattern matching, we require
a default case of the form: → e in Dromedary-variant-match-open.

C ` e : τe C  τe ≤ ‘Ki [of τi ]
∀1 ≤ i ≤ n. Ci ` ‘K [pi ] → ei : ‘Ki [of τi ] ⇒ τ
^n (Dromedary-variant-match-closed)
C∧ i=1 Ci ` match e with ‘Ki [pi ] → ei : τ

C ` e : τe C  τe ≥ ‘Ki [of τi ]
∀1 ≤ i ≤ n. Ci ` ‘K [pi ] → ei : ‘Ki [of τi ] ⇒ τ
Cn+1 ` en+1 : τ
^n+1 (Dromedary-variant-match-open)
C∧ i=1 Ci ` match e with (‘Ki [pi ] → ei | → en+1 ) : τ

[C1 ` p : τ1 ∆] C2 ` e : τ2
(Dromedary-variant-case)
[C1 ∧ def ∆ in] C2 : ‘K [p] → e : ‘K [of τ1 ] ⇒ τ2

Figure 3.3: A selection of Dromedary’s polymorphic variant typing rules for pattern matching.

We remark that Dromedary only type checks shallow patterns for polymorphic variants, namely
patterns of the form ‘K p where p does not include a polymorphic variant. Whereas OCaml

24
supports deep patterns by using exhaustive pattern checking [14] to determine whether the
matched variant type τe is closed or open. Exhaustive checking in the presence of GADTs
reduces to proof search [15], which is outside the scope of this dissertation due to its
complexity; nonetheless, we foresee no difficulties incorporating exhaustive checking in our
constraint-based approach.

3.1.6 Generalised Algebraic Data Types


Generalised algebraic data types extend algebraic data types in two ways. The first is an instance
of Laüfer and Odersky’s extension to ML with existential types [27]. The second extension
allows us to constrain the type for each constructor (Section 2.2.3); we do so by permitting
constructors to have constrained type schemes with existential variables:

Ψ ` K : ∀α.∃β.C ⇒ [τ →] α F .

For example, one may write GADT α expr (Listing 2.6), using equality constraints, as shown in
Figure 3.4 and Listing 3.2. The novelty of GADTs lies in the constraint C; in order to use

Int : ∀α.α = int ⇒ int → α expr


Pair : ∀α.∃βγ.α = β × γ ⇒ β expr × γ expr → α expr
Fst : ∀α.∃βγ.α = β ⇒ (β × γ) expr → α expr
Snd : ∀α.∃βγ.α = γ ⇒ (β × γ) expr → α expr

Figure 3.4: The formal definition of the type α expr, originally defined in Listing 2.6.

type 'a expr =


| Int of int constraint 'a = int
| Pair of 'b 'c. 'b expr * 'c expr constraint 'a = 'b * 'c
| Fst of 'b 'c. ('b * 'c) expr constraint 'a = 'b
| Snd of 'b 'c. ('b * 'c) expr constraint 'a = 'c
Listing 3.2: The type definition of the ' a e x p r GADT in Dromedary – new syntax was
introduced for existential variables and explicit constraints.

a constructor K e, e must have the type τ and the type variables α, β must be instantiated
such that the constraint C is satisfied. Pattern matching now binds local type variables and
constraints: If K p matches a value of type α F, then there exists unknown types β that satisfy
C which may be bound in the fragment of p.
Ambivalent Types Dromedary’s typing discipline for GADTs is based on Garrigue’s and
Rémy’s ambivalent types [17]. Informally, an ambivalent type ζ is a set of types that are equal
under the local constraints; they are used when the type is ambiguous – namely when |ζ| > 1.
An ambivalent type is said to have leaked if the set of types are no longer equal under the local
constraints. To illustrate this, we consider:

let g (type a) (eq : (a, int) eq) (y : a) =


match eq with Refl -> if y > 0 then y else 0

where the equality type e q is given in Listing 2.7. The t h e n branch returns y, with type a,
whereas the e l s e branch returns a value of type i n t. The resultant type is the ambivalent

25
type ζ = {a, int}, which represents a type that is either a or i n t When exiting the scope of
m a t c h branch, ζ is leaked – since the local equality a = int is no-longer present in the context!
Ambiguities are eliminated using annotations (Section 3.1.2); for an expression (e : τ ), the
expressions e and (e : τ ) may have differing ambivalent types ζ1 , ζ2 , but τ must be included in
both – to ensure soundness.
Ambivalent types rely on sharing (Section 3.1.4) to guarantee the inference of principal types.
When instantiating a type scheme ∀α.ζ without sharing, we lose the information that all copies
of α must be structurally equal since types that are not structurally equal may be equated due
to local equalities. Sharing recovers this information as each copy of α corresponds to the same
node in the graph-based representation of ζ (Section 3.1.4).
Ambivalent Constraints We now present our novel constraint language, extended with
ambivalent types:
C ::= true | false | C ∧ C | ∀α.C | ∃ζ.C
| ζ = ζ | ψ ⊆ ζ | R =⇒ C

ψ ::= α | ζ F ,
where ζ is an ambivalent type variable and ψ is a shallow type, either consisting of a shallow type
former ζ F or a rigid variable α. R ::= true | R ∧ R | τ = τ defines rigid constraints; constraints
solely consisting of equalities between (rigid) types.
We briefly highlight the new constructs of our language. We introduce existential quantifiers
∃ζ.C for ambivalent type variables. We enforce sharing by preventing (deep) types τ from
occurring in constraints. The ζ = ζ constraint is used in lieu of τ = τ , providing a first-order
equality constraint between ambivalent types; and the subset constraint ψ ⊆ ζ, read as: the
ambivalent type ζ includes the type ψ, is used to define explicitly shared types.
In Dromedary, we restrict the local constraints of GADT types to rigid constraints, hence type
schemes for constructors are of the form:
Ψ ` K : ∀α.∃β.R ⇒ [τ →] α F ,

where variables β are considered rigid. This mimics OCaml’s requirement to annotate GADT
types with rigid variables [16]. We may introduce local rigid constraints using the new implication
constraint R =⇒ C. Semantically, implication constraints also ensure no ambivalent types are
leaked when exiting the scope of the implication.
In practice, our constraint language differs from our presentation here since we implement
ambivalent types using scoped abbreviations, which provides an efficient (linear) consistency and
leakage check. This is the approach used by OCaml (4.12.0). While we do not fully explain this,
it seems important to acknowledge the difference.
Typing Rules We begin by extending the notion of a fragment, introduced in Section 3.1.1,
to generalised fragments. A generalised fragment Θ is a triple, consisting of a context of
existential variables β, a rigid constraint R, and a fragment ∆, written as Θ ::= ∃β.∆ ⇒ R.
These generalised fragments describe all typing information gained from a pattern that includes
GADTs.
The typing rules2 (Figure 3.5) extend our presentation of algebraic data types (Section 3.1.1).
Dromedary-pat-construct checks whether the constructor K has the type τ1 → τ2 , and binds
2
The presented typing rules here differ from the ones given in Appendix C due to sharing, which is a (trivial )
technical detail we omit.

26
local existential variables β and constraints R. The sub-pattern p checked to have the type τ1 ,
binding the fragment Θ. The novelty of GADTs require β and R to be bound in the fragment
of K p, extending Θ, which we write as ∃β.Θ ⇒ R.
Dromedary-pat-tuple requires each pattern pi in the tuple (p1 , . . . , pn ) of type τ1 × · · · × τn to
have the type τi . Each pattern produces a fragment Θi . The resultant fragment of (p1 , . . . , pn ),
is the concatenation Θ1 × · · · × Θn of the individual fragments.
In Dromedary-case, the pattern p is checked against the matched type τ1 , giving us the fragment
∃β.∆ ⇒ R. The case body e is then checked to have the type τ2 , under the assumptions of R,
using an implication constraint. The local existential variables β are universally quantified since
they represent unknown local types within C2 . We also note that the constraint C1 , which the
pattern is checked under, also assumes the local constraints R – permitting local constraints to
flow between patterns in tuples.

C  K ≤ ∃β.τ1 → τ2 ⇒ R C ` p : τ1 Θ
(Dromedary-pat-construct)
C ` K p : τ2 ∃β.Θ ⇒ R

∀1 ≤ i ≤ n Ci ` pi : τi Θi
^n (Dromedary-pat-tuple)
i=1 Ci ` (p1 , . . . , pn ) : τ1 × · · · × τn Θ1 × · · · × Θn

C1 ` p : τ1 ∃β.∆ ⇒ R C2 ` e : τ2
(Dromedary-case)
∀β.R =⇒ C1 ∧ def ∆ in C2 ` p → e : τ1 ⇒ τ2

Figure 3.5: The relevant typing rules for GADTs from Dromedary’s type system.

3.2 Inference Implementation


The OCaml compiler takes a program, parses it creating a parsetree, and performs type inference
creating an explicitly typed intermediate representation, called the typedtree, used in
the backend to generate bytecode.
Source code Parsetree Typedtree Bytecode
Lexer + Parser Type Inference Backend

Figure 3.6: A phase diagram of the OCaml compiler.


Dromedary re-implements the first two stages. The design of Dromedary embodies the
separation of concerns (SoC) principle [26], separating these stages into more distinct modular
phases using a constraint-based approach. Our implementation focuses on correctness and clarity
over performance (unless specified otherwise).
Section 3.2.3 Section 3.2.2

Constraint Generation Constraint Solving Type Reconstruction


Generalisation
Computations

with a Value
Constraints
Generation

Union-find
Unification
Constraint

Constraint

Parsetree Typedtree
Solving

Our explanation of Dromedary’s type inference is organised according to the constraint pipeline
(Figure 1.1), with the Sections 3.2.2 and 3.2.3 structured as illustrated above.

27
3.2.1 Repository Overview
The top-level project directory consists of the source code src/, tests test/ and benchmarks
benchmark/, with additional files for the Dune build system. Table 3.1 gives an overview of the
repository structure. Within src/, Dromedary is split into a parsing library, a constraints

Library Description Lines

Implements the constraint language and con-


src/constraints 3022
straint solver.

src/constraints/ Contains type definitions, pretty-printing and


717
constraint module types for the constraint language.

Solves constraints, producing their values or an


src/constraints/solver 2108
error.

Contains type definitions and pretty-printing for


src/parsing 1853
parsetree. Contains lexing and parsing code.

Contains type definitions and pretty-printing for


src/typing typedtree. Contains code for constraint genera- 3324
tion and type reconstruction.

tests/ Unit tests using Alcotest and Expect. 6853

benchmarks/ Benchmarks of Dromedary’s inference. 637

Table 3.1: Repository overview. All code is written in OCaml.

library (Section 3.2.2), and a typing library (Section 3.2.3). My project repository broadly
follows the structure of the OCaml compiler, aiding in interoperability with the OCaml compiler
in the future.

3.2.2 Constraints and Type Reconstruction


The design of the C o n s t r a i n t s library focuses on modularity and efficiency, preferring
mutable data structures over immutable ones. This approach ensures Dromedary is equally
performant as OCaml (Section 4.3). We note that no global mutable state is exposed
since the constraint language provides an immutable interface.
Constraint
Modules As mentioned in Section 2.2.1, Dromedary’s constraints library Solving
heavily relies on OCaml’s module system and functors to provide modularity,
internally and externally. The library is split into four parameterised Generalisation
modules forming a layered abstraction stack : (a) at the bottom is Tarjan’s
efficient union-find data structure [47]; (b) above it is Huet’s unification
Unification
algorithm [21]; (c) then Rémy’s efficient rank-based generalisation [41];
(d) at the top, the implementation of the constraint language and solver,
implementing Pottier’s constraints with a value [39]. Union-find

28
Constraints with a Value The objective of the constraint solver is to determine whether
the constraints are satisfiable or unsatisfiable (t r u e or f a l s e). Unfortunately, this approach
doesn’t work with type reconstruction (or elaboration) – the process of constructing the typedtree.
Many languages using constraint-based inference, such as Haskell [37], resort to combining
the phases of constraint solving and elaboration. However, this approach violates the SoC
principle that Dromedary adheres to. In [39], Pottier proposes an alternative implementation of
constraints that facilitates solving and elaboration in a modular fashion. To allow elaboration,
Pottier extends constraints to not only return information of satisfiability but also values.
This gives rise to the notion of an “α-constraints”, a constraint which (if satisfiable) produces a
result of type α. In Dromedary, these constraints are represented as generalised algebraic
datatype ' a C o n s t r a i n t . t:
type _ t =
| True : unit t
| Conj : 'a t * 'b t -> ('a * 'b) t
| Eq : variable * variable -> unit t
| ...
| Def : def_binding list * 'a t -> 'a t
| Let :
'a let_binding list * 'b t
-> ('a term_let_binding list * 'b) t
| Return : 'a -> 'a t
| Map : 'a t * ('a -> 'b) -> 'b t
| Decode : variable -> Decoded.Type.t t

For example, the conjunction constraint C o n j (C1 , C2 ) returns a pair of values (v1 , v2 )
composed of the values returned by C1 , C2 respectively. Following Pottier, we also extend our
constraints language with a M a p (C , f ) construct which evaluates C to some value v (if
satisfiable), and returns the value f v. We refer the reader to [39] for the complete formal
semantics of constraints with a value. This approach allows Dromedary to express constraint
generation and type reconstruction using the constraint language – in the same place!
The constraints library (Listing 3.4) is parameterised by the notion of an algebra. Informally, an
A l g e b r a specifies the term variables embedded in the constraint language and the structure
of Dromedary’s types. The constraints library provides an abstract type ' a t for constraints
that produce a value of type ' a, a number of combinators for constructing constraints, and a
solve function that solves the constraint, either returning a value of type ' a or an e r r o r.
The constraints language is equipped with r e t u r n, b o t h ( & ~ ) and m a p combinators,
forming an applicative functor (Section 2.2.2). Dromedary makes extensive use of this abstraction
with Jane Street’s p p x _ l e t [4], which provides syntactic sugar for working with applicatives
(and monads):
(exp1 &~ exp2) let%map exp1 = exp1
>>| fun (exp1, exp2) -> and exp2 = exp2 in
Texp_app (exp1, exp2) Texp_app (exp1, exp2)
Listing 3.3: Desugared (left) verses p p x _ l e t syntax (right) for applicatives (and monads).

Constraints, like any intermediate representation, may be optimised. Dromedary uses smart
constructors to perform peephole optimisations on constraints. For instance, the equivalence
∀α.∀β.C ' ∀α, β.C may be used to reduce the number of (expensive) generalisation operations
performed. Such optimisations are not possible in OCaml’s type checker.

29
module Make (Algebra : Algebra) : sig
(** Abstract type for ['a Constraint.t] *)
type 'a t

(** Constraints form an applicative functor *)


include Applicative.S with type 'a t := 'a t
include Applicative.Let_syntax with type 'a t := 'a t

(** Combinators for constructing constraints *)


val ( &~ ) : 'a t -> 'b t -> ('a * 'b) t
...

val solve : 'a t -> ('a, [> Solver.error ]) Result.t


end
Listing 3.4: A snippet of the C o n s t r a i n t s library interface.

(** The type ['a t] denotes a node within a given disjoint set.
['a] is the type of the value (descriptor) of the node. *)
type 'a t

val make : 'a -> 'a t


val find : 'a t -> 'a
val union : 'a t -> 'a t -> f:('a -> 'a -> 'a) -> unit
Listing 3.5: The module signature for Dromedary’s implementation of the union-find data
structure.

Union find Unification is the process of solving equations of the form:

U ::= true | U ∧ U | ∃α.U | τ = τ .

Tarjan’s union-find data structure (Listing 3.5) implements a family of disjoint sets (equivalence
classes of types), each set associated with a descriptor (the representative type); with the
following operations: f i n d t returns the descriptor of set t; u n i o n t 1 t 2 ~ f computes
the union of the sets t 1 , t 2 merging their descriptors using f.
Dromedary implements a forest-based structure, consisting of a collection of trees, each tree
representing a disjoint set:
type 'a t = 'a node ref
and 'a node =
| Root of { rank : int; desc : 'a }
| Link of 'a t

A ' a n o d e represents a node in a tree (a set): which is either the root of the graph, containing
the descriptor of the set, or an internal node with no data and a parent node, known as a link.
For quasi-linear complexity in time for f i n d and u n i o n, we implement path compression
and union by rank [47], the latter not being implemented in OCaml’s type checker.
Unification and Structures Dromedary extends first-order unification with several non-
trivial extensions: (a) under a mixed prefix [32], unification in the presence of existential and
universal quantifiers (Section 3.1.2); (b) the addition of unscoped equational context A for type

30
abbreviations; (c) scopes and scoped equational contexts for ambivalence (Section 3.1.6); (d)
rows (Section 3.1.5).
Each extension to unification is independent and thus may be implemented modularly, using
the notion of a structure, which describes the descriptor attached to equivalence classes in
unification. The interface for a structure is given in Listing 3.6, consisting of: (a) an abstract
type for structures ' a t which contains children of type ' a; (b) a function m e r g e, which
is used to equate two structures t 1 , t 2 of type ' a t, within some context c t x of type
' a c t x, returning the resultant merged structure or raising the exception C a n n o t _ m e r g e if
the structures are not compatible; (c) functorial functions (Section 2.2.2) such as m a p, i t e r,
and f o l d used to traverse the structure performing various element-wise operations.
module type Structure = sig
type 'a t

type 'a ctx


exception Cannot_merge
val merge
: ctx:'a ctx -> equate:('a -> 'a -> unit)
-> 'a t -> 'a t -> 'a t

val map : 'a t -> f:('a -> 'b) -> 'b t


val iter : 'a t -> f:('a -> unit) -> unit
val fold : 'a t -> f:('a -> 'b -> 'b) -> init:'b -> 'b
end
Listing 3.6: The module signature for first-order unification structures.

Structures may be composed and extended using functors (Section 2.2.1). To illustrate this, we
may define a structure called F i r s t _ o r d e r (Listing 3.7) which extends an arbitrary structure
S with variables.
module First_order (S : Structure) : sig
type 'a t =
| Var
| Structure of 'a S.t

include S with type 'a t := 'a t and type 'a ctx = 'a S.ctx
end
Listing 3.7: An example of a composable unification structure using OCaml’s functors – the
structure F i r s t _ o r d e r extends a structure S adding (uni-sorted) variables.

The U n i f i e r module, which implements types and unification, is parameterised by a


S t r u c t u r e. Types are defined as cyclic directed graphs where each node contains a structure:

type t = desc Union_find.t


(** Arbitrary [id] field, used for printing & total ordering *)
and desc = { id : int; structure : t Structure.t }

This graphical definition permits equi-recursive types (Section 3.1.5) and sharing (Section
3.1.4), which is key for efficient unification.

31
Generalisation In the context of constraint solving, generalisation is the process of simplifying
constrained type schemes ∀α.C ⇒ τ to type schemes ∀β.τ 0 , which is performed when solving let
constraints.
For approximately linear time generalisation and instantiation, we implement Rémy’s
efficient rank -based scheme. Each type variable in the constraint is annotated with an integer
level (or rank ), which is used to determine the scope of the variable (in constant time): variables
with level l are bound in the lth nested ∀-quantifier in constrained type schemes of let constraints,
with 0 being the outermost level. For example, the following depicts the levels within the
generated constraint of expression let id = fun x → x in id:

Level 1 Level 0

Figure 3.7: A visualisation of rank-based generalisation [41] for the generated constraint of
let id = fun x → x in id.

The essential observation is that we cannot generalise variables bound in the enclosing scope,
such as α0 , since they may be equated after we exit the current scope – namely in the in id ≤ α0
portion of the above constraint.
When equating two type variables during unification, we reduce their level to the lowest of their
levels (outermost scope). Thus, when generalising (exiting lth level), we only generalise variables
whose level is greater than or equal to l – variables that are not bound in an enclosing scope.

3.2.3 Typing and Constraint Generation


The T y p i n g library implements constraint generation and elaboration for the typedtree, using
the C o n s t r a i n t s library. Unlike OCaml’s current T y p i n g library, Dromedary’s focuses on
clarity and correctness relying on abstractions such as monads and explicit error types over
side-effecting operations, at a potential performance cost.
Computations One of Dromedary’s fundamental (and novel) abstractions in constraint
generation is the idea of a computation. Concretely, computations (and binders) are defined by
the following domain-specific language:

τ ::= τ → τ | τ computation | τ binder | τ constraint | . . . Types


e ::= x | fun x → e | e e | { t } | [ u ] | C | . . . Expressions
t ::= let x = t; t | return e | bind x = u; t | . . . Computation commands
u ::= let x = u; u | return e | sub x = t; u | exists | forall | . . . Binder commands

Figure 3.8: The formal syntax of computations and binders.

A τ computation computes a value of type τ within the context required for constraint generation.
Within computations, one may define the notion of a binder, which represents a context for
binding variables within constraints; for example, ∃α.[·] is a binding context for the variable

32
α with a ‘hole’ (represented by the binding command exists). Computations and binders both
form monads (Section 2.2.2) with b i n d (let commands) and r e t u r n operations.
The computation bind x = u; t applies the binder u of type τ , binding x to its value, and fills u’s
‘hole’ with the constraint returned by the computation t. The binder sub x = t; u computes the
computation t of type τ , binding its value to x and returns the binder u. The DSL is shallowly
embedded in OCaml using p p x _ l e t and OCaml’s let-binding operators for b i n d and s u b
(using l e t @ and l e t &, respectively). See Appendix D for the complete embedding.
Constraint Generation Dromedary’s constraint generation utilises α-constraints and com-
putations to express constraint generation and type reconstruction together, resulting in concise,
compositional, and maintainable code that naturally reflects the formal constraint mapping
Je : τ K (Figure 2.3).
For example, the following snippet generates constraints for the application exp1 exp2
(following the definition in Figure 2.3) and constructs the respective typedtree fragment
T e x p _ a p p ( e x p 1 , e x p 2 ) in the same code segment:
| Pexp_app (exp1, exp2) ->
(* bind [var] existentially *)
let@ var = exists () in
(* check [exp1] has type [var -> exp_type];
and [exp2] has type [var] *)
let%bind exp1 = lift (infer_exp exp1) (var @-> exp_type) in
let%bind exp2 = infer_exp exp2 var in
return
(let%map exp1 = exp1
and exp2 = exp2 in
Texp_app (exp1, exp2))

Listing 3.8: A snippet of Dromedary’s constraint generation illustrating the usage of constraints,
computations, and binders for clear, compositional, and maintainable code.

3.3 Summary
This chapter began by introducing Dromedary’s type system, the first unified presentation of
OCaml’s type system in a constraint-based setting, which we believe to be (a) more natural than
other presentations for certain features, such as GADTs; (b) better suited to correctness proofs
and formal verification of the type checker, an ongoing field of research [11, 6]. We discussed
various advanced type system features of Dromedary and their constraint-based formalisation,
requiring many novel extensions to the constraints language. The author wishes to emphasise
that Dromedary implements additional features not covered in this section3 , including
abstract types, side-effecting primitives, type abbreviations, extensible variants, and structures.
Having discussed the theoretical aspects of Dromedary’s type system and its features, we
explored the practical implementation of Dromedary’s type inference algorithm – focusing on
mechanisms that allow Dromedary to implement SoC. Dromedary’s constraints library
is fundamentally modular, while implementing quasi-linear constraint solving in
time4 provided type schemes have bounded size [30]. The typing library, which implements
Dromedary’s constraint-based inference, was designed to focus on clarity and correctness;
permitting the effortless description of constraints and type reconstruction using computations.
3
Due to the page limit.
4
Not formally analysed.

33
4 Evaluation
In this section, I will evaluate whether the implementation of Dromedary fulfilled the success
criteria outlined in the project proposal (Appendix E). In Section 4.1, I demonstrate that
Dromedary far exceeds the success criteria. Following this, I explore the permissiveness of
Dromedary’s type system in comparison to OCaml’s, empirically showing that Dromedary
is as permissive as OCaml. Finally, in Section 4.3, I show that Dromedary outperforms
OCaml in our benchmarks.

4.1 Project Requirements and Success Criteria


Analysing the requirements stated in the preparation chapter (Section 2.3), I have achieved all
my must-have, should-have and could-have requirements (Table 2.1). Considering the
original success criteria detailed in the proposal (Appendix E):
Design Dromedary’s type system; supporting ML with GADTs Dromedary’s type
system far exceeds the minimum requirements of ML with GADTs, implementing all
the features from Core ML and support for semi-explicit first-class polymorphism,
polymorphic variants, type abbreviations, polymorphic recursion, extensible variants, and
structures.

Design the constraint language for Dromedary I successfully designed a constraint lan-
guage capable of expressing all of Dromedary’s features (Section 3.1); often requiring
novel extensions on previous work (Section 2.1.2).

Implement a constraint-based inference algorithm for Dromedary Not only did I im-
plement a constraint-based inference algorithm for Dromedary; but one that was funda-
mentally more modular and more performant than OCaml’s inference algorithm!

Evaluate the permissiveness and efficiency of Dromedary I use the Jane Street Expect
Test and Core Bench library to evaluate the permissiveness and performance of Dromedary’s
inference algorithm, performing 427 experiments.

4.2 Permissiveness of Dromedary


In this section, we discuss the permissiveness of Dromedary’s type system. Since Dromedary
implements a subset of OCaml, we aim to show that Dromedary is equally permissive to
OCaml; that is to say that everything Dromedary successfully type checks, OCaml type checks
and vice versa1 .
Methodology We split Dromedary’s type system into its main constituent features: Core
ML features, semi-explicit first-class polymorphism, polymorphic recursion, GADTs, and
polymorphic variants.
We use the selection of relevant programs used in the OCaml compiler test suite [51] for each
feature. If insufficient programs are available from the said test suite, then we use a carefully
crafted corpus of programs taken from various academic papers and textbooks.
1
In the implemented features.

34
Results We completed a total of 412 tests, summarised in Table 4.1. Each test we performed
concluded that OCaml and Dromedary are equally permissive in the implemented fea-
tures. We briefly discuss our tests and results in two categories: tests from the OCaml test
suite, and tests using examples from other sources:

OCaml testsuite: Of the 631 relevant tests for semi-explicit first-class polymorphism,
GADTs and polymorphic recursion in the OCaml test suite, Dromedary was able to
implement 283 of them.
All tests that we were unable to implement were due to features not supported by
Dromedary:
• 16% (57) were due to interactions with the module system,
• 47% (159) relied on objects and classes,
• 37% (132) involved other miscellaneous features of OCaml that are not supported in
Dromedary.

Other sources: Since OCaml’s test suite lacked representative tests for Core ML features and
polymorphic variants, we relied on other sources for examples.
For Core ML features, we curated a corpus of 111 programs using examples from
Whitington’s ‘OCaml from the very beginning’ [40], Paulson’s ‘ML for the working
programmer’ [23] and the foundations of computer science lecture notes [1]. Similarly, for
polymorphic variants, we used examples from ‘Real-world OCaml’ [34] and various papers
on polymorphic variants [12, 42], resulting in 18 additional programs.
Dromedary was able to correctly type check all of these programs.

In practice, we found that translating programs between Dromedary and OCaml only requires
minor syntactic changes where the syntax differs – for example, scoped annotations (Section
3.1.2). In total, we approximately translated 4100 lines of OCaml to Dromedary.
Since Dromedary successfully passes all tests relevant to its type system features, we conclude
that Dromedary is equally permissive as OCaml. These are encouraging results, suggesting that
the constraint-based approach used in Dromedary could be integrated with OCaml without
significant backwards compatibility issues, demonstrating the practicality of our work.
Given that we performed 412 tests, we also view these results as empirical evidence for the
correctness of Dromedary’s type system and its implementation.

4.3 Benchmarks
In this section, we discuss the efficiency and asymptotic behaviour of Dromedary; substanti-
ating our claim that Dromedary implements quasi-linear constraint solving and showing that
Dromedary is more performant than OCaml.
Methodology OCaml programs are type-checked using the OCaml compiler. I ensure the
benchmarks are comparative by modifying OCaml’s inference algorithm to ensure it only type
checks relevant features – for example, disabling inference for objects/classes and modules. This
was achieved by forking the implementation of the OCaml compiler; removing many unnecessary
libraries and modules, and replacing certain functions within the implementation with stubs.
Since Dromedary infers principal types and permits equi-recursive types (Section 3.1.5), OCaml’s
-principal and -rec-types compiler flags are enabled.

35
Feature Testsuite Files Tests
OCaml Dromedary

Core ML:
whitington.ml 51 51
paulson.ml 5 5
focs.ml 22 22
infer_core.ml 33 33

Semi-explicit First-class
Polymorphism:
poly.ml 141 9
pr7636.ml 3 2
pr9603.ml 2 0
error_messages.ml 10 0

Polymorphic Recursion:
poly.ml 5 5

GADTs:
ambiguity.ml 16 13
ambivalent_apply.ml 3 3
didier.ml 7 5
dynamic_frisch.ml 24 24
gadthead.ml 2 0
name_existentials.ml 12 12
nested_equations.ml 8 2
omega07.ml 56 56
or_patterns.ml 58 0
term_conv.ml 5 5
unify_mb.ml 14 14
principality_and_gadts.ml 38 19
return_type.ml 3 0
yallop_bugs.ml 4 0
unexpected_existentials.ml 16 2
test.ml 84 56
pr*.ml 120 56

Polymorphic Variants:
docs.ml 4 4
garrigue.ml 5 5
real_world_ocaml.ml 7 7
remy.ml 2 2

Table 4.1: A summary of tests in each file for Dromedary and OCaml – consisting of 412 tests.

36
800

600
Time (µs)

400

200

insertion sort

(perfect tree)
iter
gcd

fact

arith

making change

map

lookup

is even
and is odd

length
eval

map elem

coloring
Programs

Dromedary OCaml
Figure 4.1: Benchmarks of various programs using 10000 trials. A subset from the corpus is
used for permissiveness testing. Error bars represent ±2σ.

20,000 107

15,000 106
Time (µs)
Time (µs)

105
10,000
104
5,000 103

0 102

0 0 0 0 0 1 2 3 4 5 6
50 10
0
15
0
20
0

Input parameter n Input parameter n


Dromedary OCaml Dromedary OCaml
(a) Inference with exponentially sized types (b) Inference with exponentially sized type schemes

Figure 4.2: Benchmarks comparing Dromedary and OCaml’s asymptotic behaviour in classical
exponential cases for ML inference. Shaded areas represent the 95% confidence interval (±2σ).
10000 trials for (a), 200 trials for (b).

37
For the benchmark of each feature, we selected random programs from our permissiveness tests.
We used programs of our devising to examine the asymptotic behaviour of Dromedary and
OCaml.
The benchmarks are automated using the Core_bench micro-benchmarking library [3]. Mea-
surements are split into samples, performing linear regression to predict the execution time.
The primary source of non-determinism in benchmarks are the effects of garbage collection
(GC), which we minimise by ensuring the GC is stabilised between each benchmark. We use a
bootstrapping phase consisting of 10% of the trials to achieve tight error bounds. Measurements
were collected using my personal computer with the following specification:
Processor Intel Core i7-8700 3.20GHz

Memory 16 GB 2133 MHz DDR4 RAM

OS Windows 10 Pro Version 10.0.19043

OCaml Version 4.12.0


Results Figure 4.1 compares the inference times of various programs in Dromedary and
OCaml – taken from our permissiveness experiments. Dromedary is usually more performant
than OCaml, however, only marginally. Two factors can explain this:
• Dromedary translates programs into constraints. Like any intermediate representation,
we can optimise constraints – Dromedary employs a variety of peephole optimisations on
constraints (Section 3.2.2) that aim to minimise costly operations such as generalisation.
This, we believe, explains the significant difference in the timing of eval, since ambivalent
types heavily rely on generalisation for consistency checking.
These kinds of optimisations are not possible with OCaml’s existing type checker; they
are a fundamental advantage of a constraint-based approach.

• Dromedary is better optimised due to its more modular approach, specifically in unification
and generalisation; using the union-by-rank optimisation (Section 3.2.2) and more compact
and efficient data structures for generalisation.
Some of these optimisations are possible with OCaml’s existing approach, nevertheless,
implementing them would be a technically demanding task owing to the fragility and
complexity of the type checker in its present state.
Figure 4.2 compares the asymptotic behaviour of OCaml and Dromedary. Dromedary consistently
outperforms OCaml in these benchmarks, more noticeably on smaller input parameters of n.
However, one may remark that asymptotically they behave comparably. Benchmark (a) measures
the inference of the expression:

l e t i d = f u n x - > x i n |i d i d{z · · · i d}
n times

This expression yields types of exponentially increasing sizes within the typedtree representation.
However, Dromedary and OCaml both type check the expression in quasi-linear time, as seen
in Figure 4.2 (a), owing to their use of sharing (Section 3.2.2). This corroborates our claim that
Dromedary solves constraints in quasi-linear time 2 .
In benchmark (b), we experiment with exponentially sized type schemes, which results in
exponential complexity in time, using the expression:
2
Under certain conditions [30].

38
let pair x f = f x x in
let f0 x = pair x in
let f1 x = f0 (f0 x) in
..
.
let fn x = fn−1 ( fn−1 x ) i n
fun z -> fn (fun x -> x) z

Demonstrating that Dromedary and OCaml suffer from the exponential complexity of ML
inference [24] when type schemes are unbounded, which no amount of optimisations can prevent.

4.4 Summary
Dromedary exceeded all success criteria, achieving all core requirements and extensions listed
in Section 2.3. In our benchmarks, Dromedary outperformed OCaml, demonstrating the
practicality of a constraint-based approach. Additionally, our findings indicate that Dromedary
and OCaml share the same asymptotic quasi-linear time complexity for inference2 .
Comparing the permissiveness of Dromedary and OCaml, it was clear from our results that
Dromedary’s type system offered equal expressivity in the implemented features. OCaml
and Dromedary programs only differed on minor syntactic features, with all OCaml programs
successfully translated into Dromedary programs. Notably, this suggests that our type system
and constraint-based approach for inference could be backwards-compatible with the existing
OCaml type checker; however, this is not formally proved. Our experiments into permissiveness
also provided empirical evidence towards the correctness of Dromedary’s type system and its
type checker.

39
5 Conclusions
The project was a resounding success, surpassing all core success criteria and completing many
of the planned extensions.
This project set out to develop a type inference algorithm for a subset of OCaml using a
constraint-based approach, designed to address the fragility and unnecessary complexity of the
current OCaml type checker.
I introduced Dromedary, a substantial subset of OCaml, whose type system I designed (Section
3.1) based on the PCB type system. I developed an ergonomic constraints language capable of
expressing many advanced type system features in OCaml, with modular constraint solving and
elaboration (Section 3.2).
Dromedary’s implementation was designed with the separation of concerns principle in mind,
which we believe improves clarity, modularity, extensibility and maintainability over the existing
OCaml type checker – an original aim of the project.
I established, experimentally, that Dromedary is equally permissive to OCaml in the imple-
mented features (Section 4.2). Additionally, I demonstrated that Dromedary outperforms
OCaml (Section 4.3), proving the practicality of a constraint-based approach.

5.1 Future Work


Dromedary and its type system provide several potential avenues for future work. Dromedary
could be extended, adding objects [43], modules [28], and other features present in OCaml, such
as PPX. Once extended, Dromedary’s type checker could be integrated with OCaml’s compiler
pipeline.
In this dissertation, we formally presented Dromedary and its type system; however, we did not
explore any of its metatheoretic properties. Formal proofs of properties, such as the soundness
and completeness of constraint generation, are necessary to ensure Dromedary’s type system
and its inference algorithm are theoretically correct.
Additionally, a formal proof does not ensure the correctness of the implementation. Extra work
could be done to verify Dromedary’s implementation using mechanised proof assistants such as
Coq or Agda.

5.2 Lessons Learnt


The timetable proposed in my original project proposal was somewhat optimistic, often under-
estimating additional term work such as supervisions or unit of assessment examinations. As a
result, certain milestones were delayed. Fortunately, I allocated sufficient slack time. However,
an improved timetable would have benefited the author.
The early unit tests required explicitly writing the parsetree of Dromedary programs. In
retrospect, I feel that prioritising the implementation of a lexer and parser would have aided
the initial unit testing, thereby advancing the evolution of Dromedary’s test suite.
Despite my initial exuberance, adding GADTs to Dromedary proved to be the most challenging
milestone to complete, requiring three attempts before our ambivalent types implementation
succeeded. On reflection, I believe it would have benefited the project to leave this feature
as an extension, replacing it with another, less ambitious, extension in the core deliverable’s
requirements.

40
Bibliography
[1] Jeremy Yallop Anil Madhavapeddy. Foundations of Computer Science (2021-2022) Course
Notes. url: https://www.cl.cam.ac.uk/teaching/2122/FoundsCS/focs- 202122-
v1.3.pdf.
[2] Anton Bachin. The Bisect ppx code coverage tool. 2022. url: https://github.com/
aantron/bisect_ppx.
[3] Jane Street Capital. Core bench micro-benchmarking framework. 2022. url: https://
github.com/janestreet/core_bench.
[4] Jane Street Capital. ppx let preprocessor. 2022. url: https://github.com/janestreet/
ppx_let.
[5] Dai Clegg and Richard Barker. CASE method fast-track - a RAD approach. Addison-Wesley,
1994. isbn: 978-0-201-62432-8.
[6] COCTI: Certificable OCaml Type Inference. 2022. url: https://www.math.nagoya-
u.ac.jp/~garrigue/cocti/.
[7] Coveralls.io. 2022. url: https://coveralls.io/.
[8] Simon Cruanes. The QCheck testing framework. 2022. url: https://github.com/c-
cube/qcheck.
[9] Jana Dunfield and Neelakantan R. Krishnaswami. “Complete and easy bidirectional
typechecking for higher-rank polymorphism”. In: ACM SIGPLAN International Conference
on Functional Programming, ICFP’13, Boston, MA, USA - September 25 - 27, 2013. Ed.
by Greg Morrisett and Tarmo Uustalu. ACM, 2013, pp. 429–442. doi: 10.1145/2500365.
2500582. url: https://doi.org/10.1145/2500365.2500582.
[10] Dirk Dussart, Fritz Henglein, and Christian Mossin. “Polymorphic Recursion and Subtype
Qualifications: Polymorphic Binding-Time Analysis in Polynomial Time”. In: Static
Analysis, Second International Symposium, SAS’95, Glasgow, UK, September 25-27, 1995,
Proceedings. Ed. by Alan Mycroft. Vol. 983. Lecture Notes in Computer Science. Springer,
1995, pp. 118–135. doi: 10.1007/3- 540- 60360- 3\_36. url: https://doi.org/10.
1007/3-540-60360-3%5C_36.
[11] Jacques Garrigue. “A certified implementation of ML with structural polymorphism
and recursive types”. In: Math. Struct. Comput. Sci. 25.4 (2015), pp. 867–891. doi:
10.1017/S0960129513000066. url: https://doi.org/10.1017/S0960129513000066.
[12] Jacques Garrigue. “Programming with polymorphic variants”. In: ML Workshop. Vol. 13.
7. Baltimore. 1998.
[13] Jacques Garrigue. “Simple Type Inference for Structural Polymorphism”. In: The Second
Asian Workshop on Programming Languages and Systems, APLAS’01, Korea Advanced
Institute of Science and Technology, Daejeon, Korea, December 17-18, 2001, Proceedings.
2001, pp. 329–343.
[14] Jacques Garrigue. “Typing deep pattern-matching in presence of polymorphic variants”.
In: JSSST Workshop on Programming and Programming Languages. Citeseer. 2004. url:
https://caml.inria.fr/pub/papers/garrigue-deep-variants-2004.pdf.

41
[15] Jacques Garrigue and Jacques Le Normand. “GADTs and Exhaustiveness: Looking for the
Impossible”. In: Proceedings ML Family / OCaml Users and Developers workshops, ML
Family/OCaml 2015, Vancouver, Canada, 3rd & 4th September 2015. Ed. by Jeremy Yallop
and Damien Doligez. Vol. 241. EPTCS. 2015, pp. 23–35. doi: 10.4204/EPTCS.241.2.
url: https://doi.org/10.4204/EPTCS.241.2.
[16] Jacques Garrigue and JL Normand. “Adding GADTs to OCaml: the direct approach”. In:
Workshop on ML. 2011.
[17] Jacques Garrigue and Didier Rémy. “Ambivalent Types for Principal Type Inference with
GADTs”. In: Programming Languages and Systems - 11th Asian Symposium, APLAS
2013, Melbourne, VIC, Australia, December 9-11, 2013. Proceedings. Ed. by Chung-chieh
Shan. Vol. 8301. Lecture Notes in Computer Science. Springer, 2013, pp. 257–272. doi:
10.1007/978- 3- 319- 03542- 0\_19. url: https://doi.org/10.1007/978- 3- 319-
03542-0%5C_19.
[18] Jacques Garrigue and Didier Rémy. “Semi-Explicit First-Class Polymorphism for ML”.
In: Inf. Comput. 155.1-2 (1999), pp. 134–169. doi: 10.1006/inco.1999 .2830. url:
https://doi.org/10.1006/inco.1999.2830.
[19] GitHub. GitHub project board. 2022. url: https : / / docs . github . com / en / issues /
organizing- your- work- with- project- boards/managing- project- boards/about-
project-boards.
[20] Fritz Henglein. “Type Inference with Polymorphic Recursion”. In: ACM Trans. Program.
Lang. Syst. 15.2 (1993), pp. 253–289. doi: 10 . 1145 / 169701 . 169692. url: https :
//doi.org/10.1145/169701.169692.
[21] Gérard P. Huet. “A Unification Algorithm for Typed lambda-Calculus”. In: Theor. Comput.
Sci. 1.1 (1975), pp. 27–57. doi: 10.1016/0304-3975(75)90011-0. url: https://doi.
org/10.1016/0304-3975(75)90011-0.
[22] The Open Source Initiative. The MIT licence. url: https://opensource.org/licenses/
MIT.
[23] Barry L. Ives. “ML for the Working Programmer by L. C. Paulson (Cambridge University
Press, 1996)”. In: ACM SIGSOFT Softw. Eng. Notes 22.4 (1997), p. 114. doi: 10.1145/
263244.773584. url: https://doi.org/10.1145/263244.773584.
[24] A. J. Kfoury, Jerzy Tiuryn, and Pawel Urzyczyn. “ML Typability is DEXTIME-Complete”.
In: CAAP ’90, 15th Colloquium on Trees in Algebra and Programming, Copenhagen,
Denmark, May 15-18, 1990, Proceedings. Ed. by André Arnold. Vol. 431. Lecture Notes in
Computer Science. Springer, 1990, pp. 206–220. doi: 10.1007/3-540-52590-4\_50. url:
https://doi.org/10.1007/3-540-52590-4%5C_50.
[25] A. J. Kfoury, Jerzy Tiuryn, and Pawel Urzyczyn. “Type Reconstruction in the Presence of
Polymorphic Recursion”. In: ACM Trans. Program. Lang. Syst. 15.2 (1993), pp. 290–311.
doi: 10.1145/169701.169687. url: https://doi.org/10.1145/169701.169687.
[26] Philip A Laplante. What every engineer should know about software engineering. CRC
Press, 2007.
[27] Konstantin Läufer and Martin Odersky. “Polymorphic Type Inference and Abstract
Data Types”. In: ACM Trans. Program. Lang. Syst. 16.5 (1994), pp. 1411–1430. doi:
10.1145/186025.186031. url: https://doi.org/10.1145/186025.186031.
[28] Xavier Leroy. “A modular module system”. In: J. Funct. Program. 10.3 (2000), pp. 269–
303. url: http://journals.cambridge.org/action/displayAbstract?aid=54525.

42
[29] Xavier Leroy. The ZINC experiment: an economical implementation of the ML language.
Technical report 117. INRIA, 1990. url: https://xavierleroy.org/publi/ZINC.pdf.
[30] David A. McAllester. “Joint RTA-TLCA Invited Talk: A Logical Algorithm for ML Type
Inference”. In: Rewriting Techniques and Applications, 14th International Conference, RTA
2003, Valencia, Spain, June 9-11, 2003, Proceedings. Ed. by Robert Nieuwenhuis. Vol. 2706.
Lecture Notes in Computer Science. Springer, 2003, pp. 436–451. doi: 10.1007/3-540-
44881-0\_31. url: https://doi.org/10.1007/3-540-44881-0%5C_31.
[31] Conor McBride and Ross Paterson. “Applicative programming with effects”. In: J. Funct.
Program. 18.1 (2008), pp. 1–13. doi: 10.1017/S0956796807006326. url: https://doi.
org/10.1017/S0956796807006326.
[32] Dale Miller. “Unification Under a Mixed Prefix”. In: J. Symb. Comput. 14.4 (1992), pp. 321–
358. doi: 10.1016/0747-7171(92)90011-R. url: https://doi.org/10.1016/0747-
7171(92)90011-R.
[33] Robin Milner. “A Theory of Type Polymorphism in Programming”. In: J. Comput. Syst.
Sci. 17.3 (1978), pp. 348–375. doi: 10 . 1016 / 0022 - 0000(78 ) 90014 - 4. url: https :
//doi.org/10.1016/0022-0000(78)90014-4.
[34] Yaron Minsky, Anil Madhavapeddy, and Jason Hickey. Real World OCaml - Functional
Programming for the Masses. O’Reilly, 2013. isbn: 978-1-4493-2391-2. url: http://shop.
oreilly.com/product/0636920024743.do%5C#tab%5C_04%5C_2.
[35] Alan Mycroft. “Polymorphic Type Schemes and Recursive Definitions”. In: Interna-
tional Symposium on Programming, 6th Colloquium, Toulouse, France, April 17-19, 1984,
Proceedings. Ed. by Manfred Paul and Bernard Robinet. Vol. 167. Lecture Notes in Com-
puter Science. Springer, 1984, pp. 217–228. doi: 10.1007/3-540-12925-1\_41. url:
https://doi.org/10.1007/3-540-12925-1%5C_41.
[36] Chris Okasaki. Purely functional data structures. Cambridge University Press, 1999. isbn:
978-0-521-66350-2.
[37] Simon Peyton Jones. Type inference as constraint solving: how GHC’s type inference engine
actually works. Zurihac keynote talk. June 2019. url: https://www.microsoft.com/en-
us/research/publication/type- inference- as- constraint- solving- how- ghcs-
type-inference-engine-actually-works/.
[38] Benjamin C. Pierce. Advanced Topics in Types and Programming Languages. 2005.
[39] François Pottier. “Hindley-milner elaboration in applicative style: functional pearl”. In:
Proceedings of the 19th ACM SIGPLAN international conference on Functional program-
ming, Gothenburg, Sweden, September 1-3, 2014. Ed. by Johan Jeuring and Manuel
M. T. Chakravarty. ACM, 2014, pp. 203–212. doi: 10.1145/2628136.2628145. url:
https://doi.org/10.1145/2628136.2628145.
[40] Prabhakar Ragde. “OCaml from the Very Beginning, by John Whitington, Coherent Press,
2013. ISBN-10: 0957671105 (paperback), 204 pp”. In: J. Funct. Program. 23.3 (2013),
pp. 352–354. doi: 10.1017/S0956796813000087. url: https://doi.org/10.1017/
S0956796813000087.
[41] Didier Rémy. Extending ML Type System with a Sorted Equational Theory. Research
Report 1766. Rocquencourt, BP 105, 78 153 Le Chesnay Cedex, France: Institut National
de Recherche en Informatique et Automatisme, 1992. url: http://gallium.inria.fr/
~remy/ftp/eq-theory-on-types.pdf.

43
[42] Didier Rémy. “Typechecking Records and Variants in a Natural Extension of ML”. In:
Conference Record of the Sixteenth Annual ACM Symposium on Principles of Programming
Languages, Austin, Texas, USA, January 11-13, 1989. ACM Press, 1989, pp. 77–88. doi:
10.1145/75277.75284. url: https://doi.org/10.1145/75277.75284.
[43] Didier Rémy and Jerome Vouillon. “Objective ML: An Effective Object-Oriented Extension
to ML”. In: Theory Pract. Object Syst. 4.1 (1998), pp. 27–50.
[44] Didier Rémy and Boris Yakobowski. “From ML to MLF : graphic type constraints with
efficient type inference”. In: Proceeding of the 13th ACM SIGPLAN international conference
on Functional programming, ICFP 2008, Victoria, BC, Canada, September 20-28, 2008.
Ed. by James Hook and Peter Thiemann. ACM, 2008, pp. 63–74. doi: 10.1145/1411204.
1411216. url: https://doi.org/10.1145/1411204.1411216.
[45] W. W. Royce. “Managing the Development of Large Software Systems: Concepts and
Techniques”. In: Proceedings, 9th International Conference on Software Engineering,
Monterey, California, USA, March 30 - April 2, 1987. Ed. by William E. Riddle, Robert
M. Balzer, and Kouichi Kishida. ACM Press, 1987, pp. 328–339. url: http://dl.acm.
org/citation.cfm?id=41801.
[46] Martin Sulzmann et al. Type inference for GADTs via Herbrand constraint abduction.
Tech. rep. Jan. 2008. url: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.
1.1.142.4392.
[47] Robert Endre Tarjan. “Efficiency of a Good But Not Linear Set Union Algorithm”.
In: J. ACM 22.2 (1975), pp. 215–225. doi: 10 . 1145 / 321879 . 321884. url: https :
//doi.org/10.1145/321879.321884.
[48] The Dune Team. OCaml Dune build system. 2022. url: https://dune.build/.
[49] The LexiFi Team. Landmarks profiling framework. 2022. url: https://github.com/
LexiFi/landmarks.
[50] The Mirage Team. Alcotest testing framework. 2022. url: https://github.com/mirage/
alcotest.
[51] The OCaml Team. The OCaml Compiler Testsuite. url: https://github.com/ocaml/
ocaml/tree/trunk/testsuite.
[52] The OCaml Team. TODO for the OCaml type-checker implementation. 2020. url: https:
//github.com/ocaml/ocaml/blob/4.12.0/typing/TODO.md.
[53] The OPAM Team. OCaml package manager OPAM. 2022. url: https://opam.ocaml.
org/.
[54] Mads Tofte and Jean-Pierre Talpin. “Implementation of the Typed Call-by-Value lambda-
Calculus using a Stack of Regions”. In: Conference Record of POPL’94: 21st ACM
SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Portland,
Oregon, USA, January 17-21, 1994. Ed. by Hans-Juergen Boehm, Bernard Lang, and
Daniel M. Yellin. ACM Press, 1994, pp. 188–201. doi: 10.1145/174675.177855. url:
https://doi.org/10.1145/174675.177855.
[55] Philip Wadler. “Comprehending Monads”. In: Math. Struct. Comput. Sci. 2.4 (1992),
pp. 461–493. doi: 10.1017/S0960129500001560. url: https://doi.org/10.1017/
S0960129500001560.

44
[56] Hongwei Xi, Chiyan Chen, and Gang Chen. “Guarded recursive datatype constructors”. In:
Conference Record of POPL 2003: The 30th SIGPLAN-SIGACT Symposium on Principles
of Programming Languages, New Orleans, Louisisana, USA, January 15-17, 2003. Ed. by
Alex Aiken and Greg Morrisett. ACM, 2003, pp. 224–235. doi: 10.1145/604131.604150.
url: https://doi.org/10.1145/604131.604150.

45
46
A Untyped Syntax
Dromedary is a representative1 subset of OCaml defined by the following grammar:

str ::= str item; ; Structure

str item ::= Structure item


| vd Value definition
| type td and . . . and td Type declaration
| type α T += constr decl | . . . | constr decl Type extension
| external x : σ = string literal prefixed by % Primitive declaration
| exception K [of τ ] Exception declaration

vb ::= (type α) p = e Value binding


vd ::= let [rec] vb and . . . and vb Value definition

td ::= α T td kind Type declaration


td kind ::= Type declaration kind
|ε Abstract type
| ... Open type
|= τ Type alias
|= constr decl | . . . | constr decl Variant type
|= { label decl ; . . . ; label decl } Record type
constr decl ::= Constructor declaration
| K [of β.τ ] [constraint E]
label decl ::= Label declaration
| ` : β.τ
E ::= Equations
| τ = τ and . . . and τ = τ

τ ::= Type
|α Type variable
|τ →τ Function type
|τ T Applied type constructor
| τ × ... × τ Tuple type
|[ρ] Polymorphic variant type
| (τ ) Parenthesis
ρ ::= Rows
| (<| ε |>) ‘K [of τ ]
1
The source syntax differs only notationally

47
σ ::= Scheme
| α.τ

c ::= Constant
| true
| false
| () Unit
| string literal, e.g. ”Hello World”
| float literal, e.g. 3.14 or .25
| int literal, e.g. 42 or -12

e ::= Expression
|x Variable
|c Constant
| fun p → e Function
|ee Function application
| vd in e Let binding
| (e) Parenthesis
| uop e Unary operator primitive
| e bop e Binary operator primitive
| if e then e else e If
| forall (type α) → e Universal quantifier
| exists (type α) → e Existential quantifier
| (e : τ ) Annotation
| { ` = e ; ... ; ` = e } Record
| e.` Record field access
| (e, . . . , e) Tuple
| K [e] Constructor
| match e with (h | . . . | h) Match
| try e with (h | . . . | h) Try
| e; e Sequence
| for p = e (to | downto) e do e done For loop
| while e do e done While loop
| ‘K [e] Variant
uop ::= Unary operator
|- Negation
|! Dereference
| ref Reference creation
bop ::= Binary operator
|+ Integer addition
|− Integer subtraction
|× Integer multiplication

48
|/ Integer division
| := Assignment

p ::= Pattern
| Wildcard
|c Constant
|x Variable
| K [p] Constructor
| (p, . . . , p) Tuple
| (p : τ ) Annotation
| ‘K [p] Variant
| (p) Parenthesis
h ::= Case
|p→e

where:

K Constructor, e.g. Nil, Cons


x Variable, e.g. type decl, exp’
α, β Type variable, e.g. ’a, ’b
T Type constructor, e.g. int, list

49
B Constraints
The complete constraints language is defined by the following grammar:

C ::= Constraint
| true Truth
| false Falsehood
|C ∧C Conjunction
| ∃α.C Existential quantification
| ∀α.C Universal quantification
| ∃ζ.C Existential ambivalent quantification
|ζ=ζ Equality
|ψ⊆ζ Subset
| ζ :> τ Ambivalent coercion
| R =⇒ C Rigid implication
| def Γ in C Explicit substitution
| let Γ in C Let binding
|x≤ζ Variable Instantiation
|σ≤ζ Scheme Instantiation
| def rec Π in C Recursive def binding
| let rec Π in C Recursive let binding

R ::= Rigid constraint


| true Truth
|R∧R Conjunction
|τ =τ Equality

where

σ ::= ∀α, ζ.C ⇒ τ Constrainted type scheme

∆ ::= · | ∆, x : ζ Fragment
Γ ::= ∀α, ζ.C ⇒ ∆ Constrained context

π ::= Recursive binding


| x : ∀α, ζ.C ⇒ ζ Inferred binding
| x : ∀α.C ⇐ τ Checked binding
Π ::= Recursive context
|· Empty context
| Π, π Context snoc

50
ψ ::= Shallow type
|α Type variable
|ζ F Shallow type former

The Algebra of Types


In this section, we formally discuss the multi-sorted algebra of types and their semantic interpre-
tation in Dromedary’s constraint language.
The language of types is sorted, where the form of types τ is constrained by their sort s – for
example, the type Σ int is invalid since Σ expects a row type. The grammar of sorts s (or kinds)
is given by:

s ::= Sorts
|? Type sort
| row(L) Row sort

where L is the enumerable set of labels and L ⊆ L. The sort ? is for basic types, such as int,
and row(L) for rows types not containing labels in L.
The grammar of the multi-sorted algebra of types τ and type formers F is defined as:

τ ::= Types
|α Type variable
|τ F Applied type former
| `L : τ :: τ Row cons
| ∂ Lτ Row uniform
| µα.τ Equi-recursive type

F ::= Type former


| Ts Constructor former
| Σ· Variant former

where T denotes a basic type constructor – in the dissertation we do not explicitly distinguish
type formers from type constructors as they are often treated the same in many contexts.
Let S be a signature for basic type constructors T, defining an arity function arityS mapping
type constructors T to their arity n ∈ N. A sorting context Γ is a sequence of bindings of type
variables α to sorts s.
Ill-sorted types are prevented using sorting judgements of the form S; Γ ` τ :: s, read as: the type
τ has the sort s in the context Γ and signature S. Like typing rules they are defined inductively,

51
as shown below:
(Type-var)
S; Γ ` α :: Γ(α)

arityS (T ) = n ∀1 ≤ i ≤ n.S; Γ ` τi :: s
(Type-former-constr)
S; Γ ` τ Ts :: s

S; Γ ` τ :: row(∅)
(Type-former-variant)
S; Γ ` Σ τ :: ?

S; Γ ` τ1 : ? S; Γ ` τ2 :: row(L ∪ {`}) L ⊆fin L \ {`}


(Type-row-cons)
S; Γ ` (`L : τ1 :: τ2 ) :: row(L)

S; Γ ` τ :: ?
(Type-row-uniform)
S; Γ ` ∂ L τ :: row(L)

S; Γ, α : s ` τ :: s
(Type-mu)
S; Γ ` µα.τ :: s

The superscripts in the algebra of types ensure that symbols are not overloaded and that each
symbol has a unique sort or signature; however, we often omit these superscripts for clarity (as
in Appendix C).
Our algebra is associated with an equaltional theory E, defined by the following set of axioms:
Commutativity Labels within row cons and row uniform types may be permuted. That is,
for all labels `1 , `2 ∈ L, finite subset of labels L ⊆fin L \ {`1 , `2 }, and types τ1 , τ2 , τ3 , the
following axioms hold:

L∪{`1 } L∪{`2 }
(Type-eq-comm-row-cons)
`L1 : τ1 :: (`2 : τ2 :: τ3 ) = `L2 : τ2 :: (`1 : τ1 :: τ3 )

(Type-eq-comm-row-uniform)
∂ L τ1 = `L1 : τ1 :: ∂ L∪{`1 } τ1

Distributivity Basic type constructors T may be lifted – for instance ρ1 → ρ2 (where ρ1 , ρ2


are row types) is interpreted as the row type obtained by applying the constructor · → ·
point-wise to the row types ρ1 , ρ2 .
As a result of this property, the equational theory has the following axiom:
(Type-eq-distrib)
(`L : τ1 :: ρ1 , . . . , `L : τn :: ρn ) Trow(L) = `L : (τ1 , . . . , τn ) T ? :: (ρ1 , . . . , ρn ) Trow(L∪{`})

for any ` ∈ L, and L ⊆fin L \ {`}.

Equi-recursive equivalences Equi-recursive types may be folded and unfolded infinitely:


(Type-eq-equi-fold/unfold)
µα.τ = {µα.τ /α}τ

τ1 = {τ1 /α}τ τ2 = {τ2 /α}τ


(Type-eq-equi-uniqueness)
τ1 = τ2

52
Semantics We now formally define the semantic interpretation of types. Informally, the
model consists of graphical ground types generated by the grammar. However, the inclusion of
rows and equi-recursive types complicates matters.
We describe our graphical types using the notion of paths. A path π is a sequence of integers or
labels. The empty path is denoted as , and the concatenation of the path π1 followed by π2 is
written π1 · π2 .
A graphical term t over a signature S is defined as a non-empty partial function from paths
to S that is prefix-closed and well-sorted. The subterm of t rooted at π, written t \ π, is the
function π 0 7→ t(π · π 0 ). The signature of Dromedary’s graphical types Sdrom is given by:

Symbol Signature / Sort


T ?arity(T) → ?
Σ row(∅) → ?
L ?|L\L| → row(L), where L ⊆fin L

Thus, we define a graphical type t as a graphical term over the signature Sdrom with a finite
number of distinct subterms1 . We write T for the set of graphical types. The set of graphical
types of sort s is defined as Ts = {t ∈ T : Sdrom ` t :: s}.
The interpretation of a type τ of sort s, under the ground assignment ϕ (Section 2.1.2), written
ϕ(τ s ) is defined as follows:

ϕ(αs ) = ϕ(α)
ϕ(τ T? ) = t ∈ T?
s.t t() = T
∧ ∀1 ≤ i ≤ arityS (T).t \ i = ϕ(τi? )
 
ϕ τ Trow(L) = t ∈ Trow(L)
s.t t() = L
∧ ∀` ∈ L \ L.t() = T
∧ ∀` ∈ L \ L, 1 ≤ i ≤ arityS (T).t \ (` · i) = ϕ(τi? )
ϕ ((Σ τ )? ) = t ∈ T?
s.t t() = Σ
∧ t \ 1 = ϕ(τ row(∅) )
ϕ (∂ L τ )row(L) = t ∈ Trow(L)


s.t ∀` ∈ L \ L.t \ ` = ϕ(τ ? )


ϕ (`L : τ1 :: τ2 )row(L) = t ∈ Trow(L)


s.t t() = L
∧ t \ ` = ϕ(τ1? )
 
row(L∪{`})
∧ ∀`0 ∈ L \ (L ∪ {`}).t \ `0 = ϕ τ2 \ `0
ϕ ((µα.τ )s ) = t ∈ Ts
s.t t = (ϕ, α 7→ t)(τ s )
1
This permits cyclic types with a finite encoding.

53
Type Abbreviations
A type abbreviation is a type constructor T with a isomorphism α T ∼
= τT , where α is a tuple of
(disjoint) type variables such that fv(τT ) ⊆ α.
To reason about equalities in the presence of type abbreviations, we seek to develop rewriting
strategies that carry out the ‘expansions’ of abbreviations.

Head expansion The type abbreviation α T ∼


= τT defines a rewriting rule t1 BT t2 between
graphical types, given by:

t1 () = T t2 = {t1 (i)/αi : 1 ≤ i ≤ arityS (T)}(τT? )


(Abbrev-head)
t1 BT t2

Intuitively, the rule defines the expansion of the head of t1 , yielding the type t2 .

Contextual expansion Similarly, the rewriting rule t1 T t2 for the abbreviation α T ∼


= τT
is defined as:
t 1 \ π BT t 2 \ π
(Abbrev-expand)
t1 T t2

This rewriting rule applies head expansion in some context (or path π) within t1 , resulting
in the expansion t2 .
The reflexive transitive closure of T is denoted ∗T and we define the complete expansion
∞ ∞ ∗
T relation by the equivalence: t1 T t2 if and only if t1 T t2 6 T .

An abbreviation context A is defined as a sequence of type abbreviations A ::= · | A, α T ∼


= τT .
We extend our rewriting rules B, to expand any abbreviation in the context A, resulting the
relations BA and A .
Since abbreviations introduce new equivalences, these must be taken into account when resolving
equalities between types in constraints. Thus, with the abbreviation context A, equality =A
corresponds to structural equality modulo the equivalence relation induced by the expansion of
abbreviations in A, that is:
∞ ∞
t1 A t t2 A t
(Abbrev-eq)
t1 =A t2

Semantics
Semantically, constraints are interpreted in the model M consisting of:

(i) The set of graphical types (henceforth referred to as ground types) t for types τ defined in
the above section.

(ii) The set of ground ambivalent types z for ambivalent types, defined as sets of ground types:

z ::= {t1 , . . . , tn }

Constraints are also interpreted under an implicit abbreviation context A.


A ground assignment ϕ is a partial mapping from type variables α to ground types. Similarly,
an ambivalent ground assignment ϑ is a partial mapping from ambivalent type variables ζ to

54
ground ambivalent types z. An environment ρ is a partial function from term variables x to sets
of ground ambivalent types.
Implications introduce equalities that must be taken into account when checking the consistency
of ground ambivalent types – using an equational context E. A ground equational context
E ::= · | E, t = t is a collection of assumed equations between ground types. We write
E t1 =A t2 , if t1 , t2 are contextually equal under the equational context E. Consistency of
ambivalent types E z is simply defined as pairwise equality under the equational context:

∀1 ≤ i, j ≤ |z|.E ti =A tj

Coercions z :> t are semantically defined by the axiom:

z =A {t}
(Coercion)
z :> t

Intuitively, the axiom states that z :> t holds if the ambivalent type z is the non-ambiguous
type t. This allows us to coerce ambivalent types to types, which is required when embedding
ambivalent variables in rigid constraints R during pattern matching.
Satisfiability judgements, defined inductively, take the form E; ϑ; ϕ; ρ C, read as: in the
environment ρ, under the equational context E, the assignments ϑ, ϕ satisfy C:

∀i E; ϑ; ϕ; ρ Ci
(Truth) (Conj)
E; ϑ; ϕ; ρ true E; ϑ; ϕ; ρ C1 ∧ C2

E; ϑ; ϕ, α 7→ t; ρ C ∀t E; ϑ; ϕ, α 7→ t; ρ C
(Exists) (Forall)
E; ϑ; ϕ; ρ ∃α.C E; ϑ; ϕ; ρ ∀α.C

E; ϑ, ζ 7→ z; ϕ; ρ C E z E, ϕ(R); ϑ; ϕ; ρ C
(Exists) (Implication)
E; ϑ; ϕ; ρ ∃ζ.C E; ϑ; ϕ; ρ R =⇒ C

(ϑ; ϕ)(ψ) ⊆ ϑ(ζ) ϑ(ζ1 ) =A ϑ(ζ2 ) ϑ(ζ) :> ϕ(τ )


(Subset) (Eq) (Coerce)
E; ϑ; ϕ; ρ ψ ⊆ ζ E; ϑ; ϕ; ρ ζ1 = ζ2 E; ϑ; ϕ; ρ ζ :> τ

ϑ(ζ) ∈ ρ(x) ϑ(ζ) ∈ (E; ϑ; ϕ; ρ)(σ)


(Inst0 ) (Inst1 )
E; ϑ; ϕ; ρ x ≤ ζ E; ϑ; ϕ; ρ σ ≤ ζ

E; ϑ; ϕ; ρ, (E; ϑ; ϕ; ρ)(Γ) C
(Def)
E; ϑ; ϕ; ρ def Γ in C

E; ϑ; ϕ; ρ ∃Γ E; ϑ; ϕ; ρ, (E; ϑ; ϕ; ρ)(Γ) C
(Let)
E; ϑ; ϕ; ρ let Γ in C

E; ϑ; ϕ; ρ, (E; ϑ; ϕ; ρ)(Π) C
(Def-rec)
E; ϑ; ϕ; ρ def rec Π in C

E; ϑ; ϕ; ρ ∃Π E; ϑ; ϕ; ρ, (E; ϑ; ϕ; ρ)(Π) C
(Let-rec)
E; ϑ; ϕ; ρ let rec Π in C

55
where the interpretation of constrained contexts and recursive contexts are given by:
n o
(E; ϑ; ϕ; ρ)(∀α, ζ.C ⇒ ζ) = ϑ0 (ζ) : ϕ =\α ϕ0 ∧ ϑ =\ζ ϑ0 ∧ E; ϑ0 ; ϕ0 ; ρ C

(E; ϑ; ϕ; ρ)(∀α, ζ.C ⇒ xi : ζi ) = xi 7→ (E; ϑ; ϕ; ρ)(∀α, ζ.C ⇒ ζi )

∃(∀α, ζ.C ⇒ ∆) ' ∀α.∃ζ.C

(E; ϑ; ϕ; ρ)(x : ∀α.C ⇐ τ ) = ρ0


s.t ρ0 (x) = {ϕ0 (τ ) : ϕ0 =\α ϕ ∧ E; ϑ; ϕ; ρ, ρ0 [x 7→ ϕ(∀α.τ )] C}
(E; ϑ; ϕ; ρ)(x : ∀α, ζ.C ⇒ ζ) = ρ0
^
s.t ρ0 (xi ) = (E; ϑ; ϕ; ρ, ρ0 )(∀α, ζ.def ∆ in Ci ⇒ ∆)(xi )
i
where ∆ = xi : ζi

^ ^
∃(x : ∀α.C ⇐ τ , x : ∀β, ζ.D ⇒ ζ) ' ∀β.∃ζ.def x : ∀α.τ , x : ζ in Ci ∧ Dj
i j

Intuitively ∃Γ checks whether the constraint C in Γ is satisfiable for all rigid variables α.
Similarly, ∃Π checks that all constraints within Π are satisfiable within the recursive context.

56
C Type System
In this appendix we present the entirety of Dromedary’s type system. We begin by formally
defining the complete multi-sorted algebra of types τ and type formers F:

τ ::= Type
|α Type variable
|ζ Ambivalent type variable
|τ →τ Function type
|τ T Applied type constructor
| τ × ··· × τ Tuple type
|Στ Polymorphic variant type
| ` : τ :: τ Row cons
| ∂τ Row uniform
| µα.τ Equi-recursive type
| τ where α = τ Explicit type substitution

F ::= Type former


|·→· Arrow former
|T Constructor former
| · × · × ··· × · Tuple former
| Σ· Variant former

where ` denotes a label and T denotes a type constructor. In the context of Dromedary’s
polymorphic variants, we define labels as ` ::= K. For more details regarding the multi-sorted
algebra of types, we refer the reader to Appendix B.
Split types For the translation of types τ into shallow types used in constraints, we require the
notion of split types. Split types ς are a pair Ξ B ζ, where the (deep) type may be reconstructed
from the subset constraints in Ξ and variable ζ.
More formally, the grammar of split types ς is given by:

ς ::= Ξ B ζ Ξ ::= ∃ζ.Ω Ω ::= · | Ω, ζ ⊇ ψ

where ψ is an shallow type, defined in Appendix B. As a notational convenience, we write Ξ B ψ


for the split type ∃ζ.Ξ, ζ ⊇ ψ B ζ. Here is formal translations between split and deep types:

bαc = · B α
bζc = ∃ · . · Bζ
bτ1 → τ2 c = Ξ1 × Ξ2 B ζ1 → ζ2 where bτi c = Ξi B ζi
bτ Tc = Ξ1 × · · · × Ξn B ζ T where bτi c = Ξi B ζi
bτ1 × · · · × τn c = Ξ1 × · · · × Ξn B ζ1 × · · · × ζn where bτi c = Ξi B ζi
bΣ τ c = Ξ B Σ ζ where bτ c = Ξ B ζ

57
b` : τ1 :: τ2 c = Ξ1 , Ξ2 B ` : ζ1 :: ζ2 where bτi c = Ξi B ζi
b∂τ c = Ξ B ∂ζ where bτ c = Ξ B ζ
bµα.τ c = ∃ζ.Ξ, ζ ⊇ ψ B ζ where b{ζ/α}τ c = Ξ B ψ
bτ1 where α = τ2 c = Ξ1 × Ξ2 B ζ1 where bτ2 c = Ξ2 B ζ2 , b{ζ2 /α}τ1 c = Ξ1 B ζ1

We extend constraints with a subset constraint for types τ ⊆ ζ using shallow type translations,
such that the following equivalence holds:
^
τ ⊆ ζ ' ∃ζ. Ω ∧ ζ = ζ 0 where bτ c = ∃ζ.Ω B ζ 0

Typing Rules
In this section, we present all of Dromedary’s typing rules.
Structures A structural context Ψ is a sequence of label and constructor bindings, that is:

Ψ ::= Structural Context


|· Empty context
| Ψ, K : ∀α.∃β.R ⇒ [τ →] α T Constructor binding
| Ψ, ` : ∀α.(∀β.τ ) → α T Label binding
| Ψ, α T = τ Alias

We write bΨc as the abbreviation context A consisting of abbreviations (or aliases) in Ψ. The
following table specifies the judgements for structural contexts:

Judgement Interpretation

In Ψ, the constructor K has the associated con-


Ψ ` K :: ∀α.∃β.R ⇒ [τ →] α T structor type scheme ∀α.∃β.R ⇒ [τ →] α T for
the type constructor T.

In Ψ, the label ` has the associated label type


Ψ ` ` : ∀α.(∀β.τ ) → α T scheme ∀α.(∀β.τ ) → α T for the type construc-
tor T

In Ψ, the type constructor T is a record type


Ψ ` T { `1 ; . . . ; `n }
with labels `1 , . . . , `n

For a constraint-based formalisation of structures we extend the constraint language with a


notion of structural constraints S and definitions D, defined by the grammar:

S ::= D; ; Structural constraint

D ::= Definition
|· Empty definition
| D, D Conj definition
| def Γ Def binding
| let Γ Let binding

58
| def rec Π Recursive def binding
| let rec Π Recursive let binding

Υ ::= Multi-context binding


| ∀α.∃ζ.C ∧ def Γ Def binding
| let Γ Let binding

where multi-context bindings Υ are required for the value restriction.


Using structural contexts and constraints, our structural judgements are of the form Ψ; S ` str
read as: under the structural context Ψ and satisfiable structural constraint S (under bΨc), str
is well-formed.
Similarly, for structure items, judgements are of the form Ψ; D ` str item Ψ0 read as: under
structural context Ψ and satisfiable definition D (under bΨc), str item is well-formed, binding
a new structural context Ψ0 , given by:

(Dromedary-str-nil)
Ψ; · ` ·

Ψ0 ; D ` str item Ψ1 Ψ1 ; S ` str


(Dromedary-str-cons)
Ψ0 ; (D; ; S) ` str item; ; str

Ψ; Υ ` vb
(Dromedary-str-item-let)
Ψ; Υ ` let vb Ψ

Ψ; Π ` vb
(Dromedary-str-item-let-rec)
Ψ; let rec Π ` let rec vb Ψ

(Dromedary-str-item-type)
Ψ; · ` type td1 and . . . and tdn Ψ, td
^
Ψ0 = Ψ, K : ∀α.[∃β.] E ⇒ [τ →] α T
(Dromedary-str-item-type-ext)
Ψ; · ` type α T += K [of β.τ ] [constraint E] Ψ0

(Dromedary-str-item-external)
Ψ; def x : σ ` external x : σ = ”%. . . ” Ψ

(Dromedary-str-item-exception)
Ψ; · ` exception K [of τ ] Ψ, K : ∀ · .∃ · .true ⇒ [τ →] exn

Expressions Expression judgements are of the form C ` e : ζ, read as: under the satisfiable
assumptions C, the expression e has the ambivalent type ζ. As in the dissertation (section 3.1.1),
we leave the structural context Ψ used within the judgements implicit.
The restriction to ambivalent type variables in the judgement leads to a restricted and explicit
type system, thus for a more natural presentation, we permit judgements of the form C ` e : τ

59
and C ` e : ψ, given by:

C`e:ζ ζ #τ
(Dromedary-exp-tau)
∃ζ.C ∧ τ ⊆ ζ ` e : τ

C`e:ζ ζ #ψ
(Dromedary-exp-shallow)
∃ζ.C ∧ ψ ⊆ ζ ` e : ψ

For various features in Dromedary’s type system discussed in Section 3.1, we expand the
constraint language with the following constructs:

Σ ::= ∃α.∀β.R =⇒ Γ Generalized Constrained Context


def ∃α.∀β.R =⇒ Γ in C ' ∃α.∀β.R =⇒ def Γ in C
let ∃α.∀β.R =⇒ Γ in C ' ∃α.∀β.R =⇒ let Γ in C

Υ ::= Multi-context binding


| ∃α.∀β.R =⇒ ∀α.∃ζ.C ∧ def Σ Def binding
| let Σ Let binding

uop ≤ ζ → ζ Unary operator instantiation


− ≤ ζ1 → ζ2 ' int ⊆ ζ1 ∧ int ⊆ ζ2
! ≤ ζ1 → ζ2 ' ∃ζ.ζref ⊆ ζ1 ∧ ζ = ζ2
ref ≤ ζ1 → ζ2 ' ∃ζ.ζ = ζ1 ∧ ζ ref ⊆ ζ2

bop ≤ ζ → ζ → ζ Binary operator instantiation


(+ | − | / | ×) ≤ ζ1 → ζ2 → ζ3 ' int ⊆ ζ1 ∧ int ⊆ ζ2 ∧ int ⊆ ζ3
:= ≤ ζ1 → ζ2 → ζ3 ' ∃ζ.ζ ref ⊆ ζ1 ∧ ζ2 = ζ ∧ unit ⊆ ζ3

Constructor instantiation
K ≤ [ζ1 →] ζ2 ' ∃ζα , ζβ .θR ∧ ζα T ⊆ ζ2 [∧ θτ ⊆ ζ1 ] if Ψ ` K : ∀α.∃β.R ⇒ [τ →] α T
where θ = {ζα /α, ζβ /β}

Label constraints
` ≤ ζ1 → ζ2 ' ∃ζα , ζβ .{ζα /α, ζβ /β}τ ⊆ ζ1 ∧ ζα T ⊆ ζ2 if Ψ ` ` : ∀α.(∀β.τ ) → α T
` : (∀β.C ⇒ ζ1 ) → ζ2 ' ∃ζα .ζα T ⊆ τ2 ∧ ∀β.{ζα /α}τ ⊆ ζ1 ∧ C if Ψ ` ` : ∀α.(∀β.τ ) → α T

Variant constraints
ζ ≤ ‘K1 [of ζ1 ] | . . . | ‘Kn [of ζn ] ' ζ ⊇ ‘K1 : [ζ1 ] :: . . . :: ‘Kn : [ζn ] :: ∂ absent
ζ ≥ ‘K1 [of ζ1 ] | . . . | ‘Kn [of ζn ] ' ∃ζρ .ζ ⊇ ‘K1 : [ζ1 ] :: . . . :: ‘Kn : [ζn ] :: ζρ

60
The typing rules are now given by:

Cx≤ζ
(Dromedary-exp-var)
C`x:ζ

Cc≤ζ
(Dromedary-exp-const)
C`c:ζ

C ` p → e : ζ1 ⇒ ζ2
(Dromedary-exp-fun)
C ` fun p → e : ζ1 → ζ2

C1 ` e1 : ζ1 → ζ2 C2 ` e2 : ζ1
(Dromedary-exp-app)
C1 ∧ C2 ` e1 e2 : ζ2

Υ ` vb C`e:ζ
(Dromedary-exp-let)
Υ in C ` let vb in e : ζ

Π ` vb C`e:ζ
(Dromedary-exp-let-rec)
let rec Π in C ` let vb in e : ζ

C1 ` e : ζ1 C2  uop ≤ ζ1 → ζ2
(Dromedary-exp-uop)
C1 ∧ C2 ` uop e : ζ2

C1 ` e1 : ζ1 C2 ` e2 : ζ2 C3  bop ≤ ζ1 → ζ2 → ζ3
(Dromedary-exp-bop)
C1 ∧ C2 ∧ C3 ` e1 bop e2 : ζ3

C1 ` e1 : bool C2 ` e2 : ζ C3 ` e3 : ζ
(Dromedary-exp-ifthenelse)
C1 ∧ C2 ∧ C3 ` if e1 then e2 else e3 : ζ

C1 ` e : ζ1 C2  x ≤ ζ2
(Dromedary-exp-forall)
let ∀α, ζ.C1 ⇒ x : ζ2 in C2 ` forall (type α) → e : ζ1

C ` {ζ/α}e : ζ
(Dromedary-exp-exists)
∃ζ.C ` exists (type α) → e : ζ

C  τ ⊆ ζ1 C  τ ⊆ ζ2 C ` e : ζ2
(Dromedary-exp-annot)
C ` (e : τ ) : ζ1

∀1 ≤ i ≤ n. Ci ` ` = e : ζ Ψ ` T { `1 ; . . . ; `n }
^n (Dromedary-exp-record)
i=1 Ci ` { `1 = e1 ; . . . ; `n = en } : ζ

C ` e : ζ1
(Dromedary-exp-record-field)
` : ∀β.C ⇒ ζ1 → ζ2 ` ` = e : ζ2

C  ` ≤ ζ1 → ζ2 C ` e : ζ2
(Dromedary-exp-field)
C ` e.` : ζ1

61
∀1 ≤ i ≤ n. Ci ` ei : ζi
^n (Dromedary-exp-tuple)
i=1 Ci ` (e1 , . . . , en ) : ζ1 × · · · × ζn

C  K ≤ [ζ1 →] ζ2 [C ` e : ζ1 ]
(Dromedary-exp-construct)
C ` K [e] : ζ2

Ce ` e : ζe ∀1 ≤ i ≤ n.Ci ` h : ζe ⇒ ζ
^n (Dromedary-exp-match)
Ce ∧ i=1 Ci ` match e with (h1 | . . . | hn ) : ζ

Ce ` e : ζ ∀1 ≤ i ≤ n.Ci ` h : exn ⇒ ζ
^n (Dromedary-exp-try)
Ce ∧ i=1 Ci ` try e with (h1 | . . . | hn ) : ζ

C1 ` e1 : unit C 2 ` e2 : ζ
(Dromedary-exp-seq)
C1 ∧ C2 ` e 1 ; e 2 : ζ

C1 ` e1 : int C2 ` e2 : int C3 ` e3 : unit


(Dromedary-exp-for)
C1 ∧ C2 ∧ def i : int in C3 ` for i = e1 (to | downto) e2 do e3 done : unit

C1 ` e1 : bool C2 ` e2 : unit
(Dromedary-exp-while)
C1 ∧ C2 ` while e1 do e2 done : unit

[C ` e : ζ 0 ] C  ζ ≥ ‘K [of ζ 0 ]
(Dromedary-exp-variant)
C ` ‘K [e] : ζ

C ` e : ζe C  ζe ≤ ‘Ki [of ζi ]
∀1 ≤ i ≤ n.Ci ` ‘K [pi ] → ei : ‘Ki [of ζi ] ⇒ ζ
^n (Dromedary-exp-var-match-closed)
C∧ i=1 Ci ` match e with ‘Ki [pi ] → ei : ζ

C ` e : ζe C  ζe ≥ ‘Ki [of ζi ]
∀1 ≤ i ≤ n.Ci ` ‘K [pi ] → ei : ‘Ki [of ζi ] ⇒ ζ
Cn+1 ` en+1 : ζ
^n+1 (Dromedary-exp-var-match-open)
C∧ i=1 Ci ` match e with (‘Ki [pi ] → ei | → en+1 ) : ζ

C ` e : ζ2
(Dromedary-exp-eq)
C ∧ ζ1 = ζ2 ` e : ζ1

C ` e : ζ2 ζ1 6= ζ2
(Dromedary-exp-exist)
∃ζ1 .C ` e : ζ2

Judgements for non-recursive and recursive value bindings are of form Υ ` vb and π ` vb,
respectively.
C`e:ζ
(Dromedary-vb-rec-mono)
x : ∀α, fav(C), ζ.C ⇒ ζ ` (type α) x = e

62
C`e:τ
(Dromedary-vb-rec-poly)
x : ∀α.C ⇐ τ ` (type α) x : τ = e

Cp ` p : ζ ∃α, β.∆ ⇒ R Ce ` v : ζ
(Dromedary-vb-val)
let ∃α.∀β.R =⇒ ∀γ, fav(Cp , Ce , ∆).Cp ∧ Ce ⇒ ∆ ` (type γ) p = v

Cp ` p : ζ ∃α, β.∆ ⇒ R Ce ` e : ζ
(Dromedary-vb-nonval)
∃α.∀β.R =⇒ ∀γ.∃fav(Ce , Cp , ∆).Ce ∧ Cp ∧ def ∆ ` (type γ) p = e

Patterns Judgements for patterns and cases are of the form: C ` p : τ Θ and C ` p →
e : ζ1 ⇒ ζ2 ; interpreted as: under the satisfiable assumptions C, the pattern p has the type ζ,
binding variables in the generalized fragment Θ and under the satisfiable assumptions C, the
case p → e matches values of type ζ1 returning values of type ζ2 , respectively.
A generalized fragment Θ is a tuple, consisting of a context of flexibly bound variables α in
rigid constraints, existential variables β, a rigid constraint R, and a fragment ∆, written as
Θ ::= ∃α, β.∆ ⇒ R.
The addition of flexibly bound (non-ambivalent) variables α in Θ is for propagation of type
information from instantiation constraints defined below. These constraints involve coercion
constraints ζ :> α, which ensure our ambivalent types can be coerced to non-ambiguous types.
We redefine constructor instantiation constraints for patterns, since constructor instantiation
for patterns semantically differs to instantiation in expressions, using the following equivalences:

K ≤ ∃α.ζ ⇒ R Nullary data constructor instantiation


^
K ≤ ∃α.ζ ⇒ R ' ∃ζ.ζ T ⊆ ζ ∧ ζi :> αi if Ψ ` K : ∀α0 .R ⇒ α0 T
i
where α = fv(R)

K ≤ ∃α, β.ζ → ζ ⇒ R Unary data constructor instantiation


^
K ≤ ∃α, β.ζ1 → ζ2 ⇒ R ' ∃ζ.{ζ/α}τ ⊆ ζ1 ∧ ζ T ⊆ ζ2 ∧ ζi :> αi if Ψ ` K : ∀α0 .∃β.R ⇒ τ → α0 T
i
where α = fv(R)

As with expressions, we permit judgements involving types and ambivalent structures, yielding
the analogous rules Dromedary-pat-tau and Dromedary-pat-shallow. The typing rules are given
by:

C ` p → e : ζ1 ⇒ ζ2
(Dromedary-var-case1 )
C : ‘K [p] → e : ‘K of ζ1 ⇒ ζ2

C`p:ζ
(Dromedary-var-case2 )
C : ‘K → e : ‘K ⇒ ζ

63
C1 ` p : ζ1 ∃β.∆ ⇒ R C2 ` e : ζ2
(Dromedary-case)
∃α.∀β.R =⇒ let ∀fav(C1 , ∆).C1 ⇒ ∆ in C2 ` p → e : ζ1 ⇒ ζ2

(Dromedary-pat-wild)
C` :ζ ∃ · .· ⇒ true

(Dromedary-pat-var)
C`x:ζ ∃ · .x : ζ ⇒ true

Cc≤ζ
(Dromedary-pat-const)
C`c:ζ ∃ · .· ⇒ true

C  K ≤ ∃α.ζ ⇒ R
(Dromedary-pat-construct0 )
C`K:ζ ∃α.· ⇒ R

C  K ≤ ∃α, β.ζ1 → ζ2 ⇒ R C ` p : ζ1 Θ
(Dromedary-pat-construct1 )
C ` K : ζ2 ∃α, β.Θ ⇒ R

∀1 ≤ i ≤ n. Ci ` pi : ζi Θi
^n (Dromedary-pat-tuple)
i=1 Ci ` (p1 , . . . , pn ) : ζ1 × · · · × ζn Θ1 × · · · × Θn

C ` e : ζ2 Θ
(Dromedary-pat-eq)
C ∧ ζ1 = ζ2 ` e : ζ1 Θ

C ` e : ζ2 Θ ζ1 6= ζ2
(Dromedary-pat-exist)
∃ζ1 .C ` p : ζ2 Θ

64
D Computations
The domain-specific language for computations (Section 3.2.3) is embedded with the following
signature:

module type S = sig


(** A computation ['a Computation.t] represents a
monadic computation that produces a value of type ['a].

Computations are designed for computating (or generating)


['a Constraint.t]'s, thus it's syntax provided by [ppx_let]
is altered (from standard Monadic Let_syntax) for this.

Computations are bound using the [let%bind] syntax:


{[
val comp1 : Typedtree.pattern Constraint.t Computation.t

let%bind pat1 = comp1 in


let%bind pat2 = comp1 in
...
]}
*)
type 'a t
include Monad.S with type 'a t := 'a t

(** [const x] creates a computation that returns a


constraint ['a Constraint.t] that evaluates to [x]. *)
val const : 'a -> 'a Constraint.t t

(** [fail err] raises the error [err]. *)


val fail : Sexp.t -> 'a t

(** [of_result result ~on_error] lifts the result [result]


into a computation, using [on_error] to compute the error
message for the computation. *)
val of_result
: ('a, 'err) Result.t
-> on_error:('err -> Sexp.t)
-> 'a t

module Binder : sig


type 'a computation := 'a t

(** A ['a Binder.t] represents a monadic binding context for a


['b Constraint.t Computation.t]. They are designed to provide
an intuitive notion of "compositional" binding.

Computations are bound using the let-op [let&].

65
*)
type 'a t
include Monad.S with type 'a t := 'a t

val exists : unit -> Constraint.variable t


val forall : unit -> Constraint.variable t
val exists_vars : Constraint.variable list -> unit t
val forall_ctx : ctx:Constraint.universal_context -> unit t
val exists_ctx : ctx:Constraint.existential_context -> unit t
val of_type : Constraint.Type.t -> Constraint.variable t

module Let_syntax : sig


val return : 'a -> 'a t
val ( let& ) : 'a computation -> ('a -> 'b t) -> 'b t
val ( >>| ) : 'a Constraint.t -> ('a -> 'b) -> 'b Constraint.t

val ( <*> )
: ('a -> 'b) Constraint.t
-> 'a Constraint.t
-> 'b Constraint.t

module Let_syntax : sig


val return : 'a -> 'a t
val map : 'a Constraint.t -> f:('a -> 'b) -> 'b Constraint.t
val both
: 'a Constraint.t
-> 'b Constraint.t
-> ('a * 'b) Constraint.t

val bind : 'a t -> f:('a -> 'b t) -> 'b t


end
end
end

(** [Let_syntax] does not follow the conventional [Let_syntax]


signature for a Monad. Instead we have standard [return]
and [bind], however, the [map] and [both] are used
for constructing constraints.

This allows the pattern for constructing constraints:


{[
let%bind p1 = comp1 in
let%bind p2 = comp2 in
return
(let%map () = var1 =~ var2 in
...)
]}
Binders are bound using the let-op [let@].
*)

66
module Let_syntax : sig
val return : 'a -> 'a t
val ( let@ )
: 'a Binder.t
-> ('a -> 'b Constraint.t t)
-> 'b Constraint.t t

val ( >>| )
: 'a Constraint.t
-> ('a -> 'b)
-> 'b Constraint.t

val ( <*> )
: ('a -> 'b) Constraint.t
-> 'a Constraint.t
-> 'b Constraint.t

module Let_syntax : sig


val return : 'a -> 'a t
val map : 'a Constraint.t -> f:('a -> 'b) -> 'b Constraint.t
val both
: 'a Constraint.t
-> 'b Constraint.t
-> ('a * 'b) Constraint.t

val bind : 'a t -> f:('a -> 'b t) -> 'b t


end
end
end

67
E Proposal

Typing OCaml in OCaml:


A Constraint-Based Approach
Part II Project Proposal

2377E

Computer Science Tripos

October 18, 2021


Project Originator: 2377E

Project Supervisors: Mistral Contrastin and Dr. Jeremy Yallop


Signatures:

Project Overseers: Prof. Andrew Moore and Andreas Vlachos


Signatures:

Director of Studies:
Signatures:

69
Introduction
Objective Caml (OCaml) introduced by X. Leroy [7] is a popular and advanced functional
programming language based on the ML language – a simple calculus defined by R. Milner [8]
offering a restricted form of polymorphism, known as let-based polymorphism, with decidable
type inference.
The core language (referred to as Core ML) extends ML with the following features: mutually
recursive let-bindings, algebraic data types, patterns, constants, records, mutable references
(and the value restriction), exceptions and type annotations. OCaml’s major extensions on
Core ML consist of first-class and recursive modules, classes and objects, polymorphic variants,
semi-explicit first-class polymorphism, generalized algebraic data types (GADTs), the relaxed
value restriction, type abbreviations, and labels.
In this project, we will implement a constraint-based type inference algorithm for a subset of the
OCaml, provisionally dubbed Dromedary, consisting of ML with mutually recursive let-bindings,
records, type annotations (a subset of Core ML, provisionally dubbed Procaml 1 ) and GADTs.
OCaml’s inference algorithm is based on algorithm W [8] with D. Remy’s [11] efficient rank-based
generalization and modifications for the above extensions. While efficient, it has become difficult
to maintain and evolve [14]. Dromedary’s solution is to re-implement OCaml’s type inference
using a constraint-based approach.
A constraint-based approach would provide a modular implementation of type inference, with
separate constraint generation, constraint solving, and type reconstruction phases, using a small
independent constraint language. Additional advantages include: combining existing constraint-
based approaches with OCaml’s approaches to increase permissiveness and applications in
OCaml’s ecosystem.
Previous work to improve OCaml’s inference algorithm focuses on incremental changes to the
current implementation [14]. Whereas our work is more ambitious and aims to provide the
foundation for a complete rewrite – which we believe to be worthwhile.
Despite Dromedary being seemingly simple, its inference will suffer from the many challenging
issues of GADT type inference, with previous work highlighting that:
• Type Systems with GADTs lack the principal (“most general”) type property. M. Sulzmann
et al. [13] show that programs with GADTs have infinitely many maximal types. Hence
a complete (unrestricted ) inference algorithm must consider all of these types, adding
significant complexity.
• GADT pattern matching introduces local typing constraints, that may result in different
branch types. Reconciling these types is difficult.
• GADT programs extensively rely on A. Mycroft’s polymorphic recursion [9]. However, F.
Henglein [4] and A. J. Kforuy et al. [6] proved that inference with polymorphic recursion
is undecidable.
Dromedary addresses these issues via a novel combination of Haskell’s OutsideIn [12] and
OCaml’s ambivalent types [2]. Constraint propagation and ambivalent types equip Dromedary
with sufficient expressiveness to reconcile differing branch types. Dromedary will require type
annotations for polymorphic recursion, guaranteeing the decidability of inference.
We will evaluate Dromedary’s inference algorithm against OCaml’s (with respect to the imple-
mented features), considering aspects such as permissiveness and efficiency.
1
After Procamelus, an extinct genus of camel

70
Starting Point
I’m familiar with types, having studied Semantics of Programming Languages. I have no
previous experience in type inference beyond ML’s classical inference algorithms [8]. I have a
basic knowledge of constraint solving having studied Prolog and Logic and Proof.
Prior to starting, I have read literature on OCaml’s type system to investigate the feasibility of
the project. I have practical experience writing OCaml programs from Foundations of Computer
Science and extra-circular study. I have practical experience extending the OCaml type checker.

Structure of the Project


The aim of this project is to implement a constraint-based inference algorithm for a subset of
OCaml, called Dromedary.
A number of design choices have already been made in order to make a concrete plan.

1. Dromedary’s type system will be formally defined, using concepts from Semantics of
Programming Languages. Its operational semantics is given by a subset of OCaml’s
semantics.
GADTs will use a novel combination of Haskell’s OutsideIn [12] and OCaml’s ambivalent
types [2], designed to increase permissiveness.

2. We will design a (first-order) constraint language for Dromedary. We will then define a
mapping (known as constraint generation mapping) from candidate typing judgements
(e.g. e : τ ) to constraints.

3. A constraint solver for constraints C will be defined.

4. Several properties of Dromedary will be stated but not proved. These include principal
types, decidability, soundness and completeness of inference. We will verify these properties
empirically, using tests from the OCaml type checker test suite.

5. Dromedary’s inference algorithm will extend F. Pottier’s framework [10] for modular and
efficient constraint generation, constraint solving, and type reconstruction, implemented
in OCaml.
The first-order unification algorithm for constraint solving will follow Huet [5], using an
efficient union-find data structure.

This project will follow an incremental structure, focusing on Procaml followed by GADTs, at
each stage, extending the type system, semantics, constraint language, and constraint solver.
Thus the structure of the project is as follows:

1. An in-depth study in OCaml’s type system to ensure I have the correct details before
starting work, focusing on [2].

2. An in-depth study of Haskell’s OutsideIn [12].

3. Defining Dromedary’s type system for Procaml.

4. Defining and implement Dromedary’s constraints and constraint solving for Procaml.

71
5. Implementing Dromedary’s GADTs.

6. Verifying the correctness of Dromedary’s type inference algorithm via tests.

7. Performing a qualitative study into the permissiveness of Dromedary’s inference algorithm.

8. Benchmarking Dromedary against OCaml’s current (4.12.0) implementation.

Success Criteria
For the project to be deemed a success, the following must be successfully completed:

1. Design the type system of Dromedary. This should support ML with GADTs.

2. Design the constraint language and constraint generation for Dromedary.

3. Implement a constraint-based inference algorithm for Procaml.

4. Implement constraint-based inference for GADTs.


We note that this criterion is not restricted by the design choices from Section 3. Hence
will be considered a success provided any GADT inference is implemented.

5. Evaluate the permissiveness and efficiency of Dromedary’s inference against OCaml’s


current (4.12.0) implementation.

Evaluation
The following is a list of possible extensions to the project:

1. Adding mutable references, exceptions and the value restriction to Dromedary.

2. Adding polymorphic variants [1] to Dromedary. The implementation will require the
notion of subtyping constraints.

3. Adding semi-explicit first-class polymorphism [3] to Dromedary. This will use an efficient
rank-based approach [11].

4. Proving properties about Dromedary’s semantics and inference, including progress, preser-
vation, principal types, and the soundness and completeness of inference.

Timetable and Milestones


Weeks 1 to 2 (7th Oct – 20th Oct)
Proposal Submitted.
Read ahead in the Types course about System F. Read up on advanced constraint-based type
inference of ML.
Read and make notes on the following papers [2] and [12].
Formalize the type system for the Procaml subset of Dromedary.
Milestone: Formalized type system of Procaml.

72
Weeks 3 to 4 (21st Oct – 3rd Nov)
Define the constraint language and its semantics.
Implement Dromedary’s constraint solver as a set of constraint rewriting rules. Write some
example constraints and verify the solver solves them correctly.
Milestone: Implemented constraint solver for Dromedary.

Weeks 5 to 6 (4th Nov – 17th Nov)


Define the mapping from candidate typing judgments to constraints for Procaml, the constraint
generation mapping.
Implement constraint generation and type reconstruction for Procaml. Write some Procaml
programs and verify that their types are correctly inferred.
Milestone: Implemented type inference for Procaml.

Weeks 7 to 8 (18th Nov – 1st Dec)


Formalize the semantics of GADTs. Define constraint generation for GADTs.
Implement Dromedary’s inference for GADTs.
Milestone: Formalized Dromedary’s GADTs!
Milestone: Finish core project implementation.

Weeks 9 to 10 (2nd Dec – 15th Dec)


End of Michaelmas term – start of Christmas holidays.
Slack time to finish off any implementation.
Prepare test cases to evaluate the permissiveness and efficiency of Dromedary.
Qualitatively evaluate the permissiveness of Dromedary’s inference algorithm.

Weeks 11 to 12 (16th Dec – 29th Dec)


Take time off for Christmas.

Weeks 13 to 14 (30th Dec – 12th Jan)


Take time off for New Year’s.

Weeks 15 to 16 (13th Jan – 26th Jan)


End of Christmas holidays – start of Lent term.
Benchmark Dromedary’s inference algorithm against the current OCaml (4.12.0) implementation.
Draft the Progress report and presentation and discuss with supervisor ahead of deadline.
Milestone: Finish core project evaluation.

73
Weeks 17 to 18 (27th Jan – 9th Feb)
Progress report deadline and presentation.
Start work on possible project extensions if time permits. Focus on extension (1) - adding
polymorphic variants to Dromedary.
Add additional test cases for work completed on extension (1).
Milestone: Finish implementation of extension (1).
Milestone: Complete progress report and presentation.

Weeks 19 to 20 (10th Feb – 23rd Feb)


Extend benchmarking to include polymorphic variants, from extension (1).
If additional time, implement extension (2) - adding semi-explicit first-class polymorphism to
Dromedary.
Add additional test cases for work completed on extension (2).
Milestone: Finish implementation of extension (2).

Weeks 21 to 22 (24th Feb – 9th Mar)


Extend benchmarking to include semi-explicit first-class polymorphism, from extension (2).
Start writing drafts for Introduction and Preparation chapters.
Milestone: Complete draft of Introduction and Preparation chapters.

Weeks 23 to 24 (10th Mar – 23rd Mar)


End of Lent term – start of Easter holidays.
Begin writing draft Implementation chapter.
Milestone: Complete draft of Implementation chapter.

Weeks 25 to 26 (24th Mar – 6th Apr)


Write-up draft Evaluation chapter, discuss with supervisor.
Additional time allocated to improve evaluation based on feedback.
Finish Conclusions chapter.
Milestone: Complete first draft of dissertation.

Weeks 27 to 28 (7th Apr – 20th Apr)


Revise for Part II exams while awaiting supervisor feedback.

Weeks 29 to 30 (21st Apr – 4th May)


End of Easter holidays – start of Easter term.

74
Incorporate feedback from supervisor and submit a new draft to supervisor and director of
studies.
Slack time for improving code quality, focusing on documentation and code style.

Week 31 to 32 (5th May – 13th May)


Revise for Part II exams.
Milestone (13th May): Submit Dissertation!

Resource Declaration
I will be using my personal computer (3.20GHz i7-8700, 16GB RAM, 1TB SSD) as my primary
machine for software development. I accept full responsibility for this machine and I have made
contingency plans to protect myself against hardware and/or software failure.
As a backup, I will use my personal laptop (Razer Blade Stealth 2017 – 1.80 GHz i7-8500U,
16GB RAM, 1TB SSD) and the Computing Service’s MCS. I will periodically backup the
dissertation and project implementation to Git version control (GitHub).

75
References
[1] Jacques Garrigue. “Simple Type Inference for Structural Polymorphism”. In: The Second
Asian Workshop on Programming Languages and Systems, APLAS’01, Korea Advanced
Institute of Science and Technology, Daejeon, Korea, December 17-18, 2001, Proceedings.
2001, pp. 329–343.
[2] Jacques Garrigue and Didier Rémy. “Ambivalent Types for Principal Type Inference with
GADTs”. In: Programming Languages and Systems - 11th Asian Symposium, APLAS
2013, Melbourne, VIC, Australia, December 9-11, 2013. Proceedings. Ed. by Chung-chieh
Shan. Vol. 8301. Lecture Notes in Computer Science. Springer, 2013, pp. 257–272. doi:
10.1007/978- 3- 319- 03542- 0\_19. url: https://doi.org/10.1007/978- 3- 319-
03542-0%5C_19.
[3] Jacques Garrigue and Didier Rémy. “Semi-Explicit First-Class Polymorphism for ML”.
In: Inf. Comput. 155.1-2 (1999), pp. 134–169. doi: 10.1006/inco.1999 .2830. url:
https://doi.org/10.1006/inco.1999.2830.
[4] Fritz Henglein. “Type Inference with Polymorphic Recursion”. In: ACM Trans. Program.
Lang. Syst. 15.2 (1993), pp. 253–289. doi: 10 . 1145 / 169701 . 169692. url: https :
//doi.org/10.1145/169701.169692.
[5] Gérard P. Huet. “A Unification Algorithm for Typed lambda-Calculus”. In: Theor. Comput.
Sci. 1.1 (1975), pp. 27–57. doi: 10.1016/0304-3975(75)90011-0. url: https://doi.
org/10.1016/0304-3975(75)90011-0.
[6] A. J. Kfoury, Jerzy Tiuryn, and Pawel Urzyczyn. “Type Reconstruction in the Presence of
Polymorphic Recursion”. In: ACM Trans. Program. Lang. Syst. 15.2 (1993), pp. 290–311.
doi: 10.1145/169701.169687. url: https://doi.org/10.1145/169701.169687.
[7] Xavier Leroy. The ZINC experiment: an economical implementation of the ML language.
Technical report 117. INRIA, 1990. url: https://xavierleroy.org/publi/ZINC.pdf.
[8] Robin Milner. “A Theory of Type Polymorphism in Programming”. In: J. Comput. Syst.
Sci. 17.3 (1978), pp. 348–375. doi: 10 . 1016 / 0022 - 0000(78 ) 90014 - 4. url: https :
//doi.org/10.1016/0022-0000(78)90014-4.
[9] Alan Mycroft. “Polymorphic Type Schemes and Recursive Definitions”. In: Interna-
tional Symposium on Programming, 6th Colloquium, Toulouse, France, April 17-19, 1984,
Proceedings. Ed. by Manfred Paul and Bernard Robinet. Vol. 167. Lecture Notes in Com-
puter Science. Springer, 1984, pp. 217–228. doi: 10.1007/3-540-12925-1\_41. url:
https://doi.org/10.1007/3-540-12925-1%5C_41.
[10] François Pottier. “Hindley-milner elaboration in applicative style: functional pearl”. In:
Proceedings of the 19th ACM SIGPLAN international conference on Functional program-
ming, Gothenburg, Sweden, September 1-3, 2014. Ed. by Johan Jeuring and Manuel
M. T. Chakravarty. ACM, 2014, pp. 203–212. doi: 10.1145/2628136.2628145. url:
https://doi.org/10.1145/2628136.2628145.
[11] Didier Rémy. Extending ML Type System with a Sorted Equational Theory. Research
Report 1766. Rocquencourt, BP 105, 78 153 Le Chesnay Cedex, France: Institut National
de Recherche en Informatique et Automatisme, 1992. url: http://gallium.inria.fr/
~remy/ftp/eq-theory-on-types.pdf.

76
[12] Tom Schrijvers et al. “Complete and decidable type inference for GADTs”. In: Proceeding
of the 14th ACM SIGPLAN international conference on Functional programming, ICFP
2009, Edinburgh, Scotland, UK, August 31 - September 2, 2009. Ed. by Graham Hutton
and Andrew P. Tolmach. ACM, 2009, pp. 341–352. doi: 10.1145/1596550.1596599. url:
https://doi.org/10.1145/1596550.1596599.
[13] Martin Sulzmann et al. Type inference for GADTs via Herbrand constraint abduction.
Tech. rep. Jan. 2008. url: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.
1.1.142.4392.
[14] The OCaml Team. TODO for the OCaml type-checker implementation. 2020. url: https:
//github.com/ocaml/ocaml/blob/4.12.0/typing/TODO.md.

77

You might also like