0% found this document useful (0 votes)
52 views20 pages

A Template For The Arxiv Style - David S. Hippocampus, Elias D. Striatum

This document presents a unified formalism for structure discovery in causal models and predictive state representation (PSR) models in reinforcement learning using higher-order category theory. It models structure discovery through simplicial objects and explores the mathematical challenges of finding extensions of these structures, emphasizing the role of homotopies and adjoint functors. The paper connects various concepts from causal inference and PSRs, illustrating their deeper relationships through the lens of category theory.

Uploaded by

contacto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views20 pages

A Template For The Arxiv Style - David S. Hippocampus, Elias D. Striatum

This document presents a unified formalism for structure discovery in causal models and predictive state representation (PSR) models in reinforcement learning using higher-order category theory. It models structure discovery through simplicial objects and explores the mathematical challenges of finding extensions of these structures, emphasizing the role of homotopies and adjoint functors. The paper connects various concepts from causal inference and PSRs, illustrating their deeper relationships through the lens of category theory.

Uploaded by

contacto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

U NIFYING C AUSAL I NFERENCE AND R EINFORCEMENT

L EARNING USING H IGHER -O RDER C ATEGORY T HEORY ∗

A P REPRINT

Sridhar Mahadevan
Adobe Research and University of Massachusetts, Amherst
[email protected], [email protected]
arXiv:2209.06262v1 [cs.AI] 13 Sep 2022

September 15, 2022

A BSTRACT
We present a unified formalism for structure discovery of causal models and predictive state represen-
tation (PSR) models in reinforcement learning (RL) using higher-order category theory. Specifically,
we model structure discovery in both settings using simplicial objects, contravariant functors from
the category of ordinal numbers into any category. Fragments of causal models that are equivalent
under conditional independence – defined as causal horns – as well as subsequences of potential
tests in a predictive state representation – defined as predictive horns – are both special cases of
horns of a simplicial object, subsets resulting from the removal of the interior and the face opposite a
particular vertex. Latent structure discovery in both settings involve the same fundamental mathemat-
ical problem of finding extensions of horns of simplicial objects through solving lifting problems
in commutative diagrams, and exploiting weak homotopies that define higher-order symmetries.
Solutions to the problem of filling “inner" vs “outer" horns leads to various notions of higher-order
categories, from Kan complexes to quasicategories and ∞-category theory. We define the abstract
problem of structure discovery in both settings in terms of adjoint functors between the category
of universal causal models or universal decision models and its simplicial object representation. In
general, the left adjoint functor from a simplicial object X to a category C is lossy, preserving only
relationships up to a certain order defined by homotopical equivalences. In contrast, the right adjoint
defining the nerve of a category constructs a lossless encoding of a category as a simplicial object.

Keywords AI · Category Theory · Causal Inference · Simplicial Objects · Machine Learning · Statistics

1 Introduction
Causal inference (Pearl, 2009a; Imbens and Rubin, 2015; Spirtes et al., 2000) and predictive state representations (PSRs)
(Singh et al., 2004) in reinforcement learning (Sutton and Barto, 1998), whose roots go back to earlier work on subspace
identification in linear systems (Van Overschee and De Moor, 1996) and even earlier work on algebraic theories of
context-free languages Chomsky and Schützenberger (1963) and algebraic automata theory (Give’on and Arbib, 1968),
both involve structure discovery of a latent variable model through interventions. The use of superficially dissimilar
representations – directed acyclic graphs (DAGs) (Pearl, 1989), hybrid undirected and directed graphs (Lauritzen and
Richardson, 2002) and hyperedge graphs (Forré and Mooij, 2017; Evans, 2018) in causal inference, versus Hankel
matrix and Hilbert space embeddings of dynamical systems – have long obscured their deeper connections. Structure
discovery in causal inference and PSRs both involve the determination of a latent structure, which is directional at
lower orders, but homotopy equivalences at higher orders induce symmetries. In particular, causal inference involves
determining a structure, such as a DAG that encodes direct causal effects between a pair of objects, but multiple DAG
models are equivalent because of symmetries induced by conditional independences (Dawid, 2001; Studený et al.,
2010a) and correlations induced by latent unobservable confounders that are only revealed over higher-order simplices
(e.g., DAGs over n > 3 vertices). PSRs represent “hidden state" in dynamical systems by constructing a series of tests,

Draft under revision. Comments welcome.
A PREPRINT - S EPTEMBER 15, 2022

X0 X3 Xn
<latexit sha1_base64="WoL00bFLbb4SnfRibC/gm+A3MFY=">AAAB63icbVBNS8NAEJ3Ur1q/qh69LBbBU0lE1GPRi8cK9gPaUDbbTbt0dxN2J0IJ/QtePCji1T/kzX9j0uagrQ8GHu/NMDMviKWw6LrfTmltfWNzq7xd2dnd2z+oHh61bZQYxlsskpHpBtRyKTRvoUDJu7HhVAWSd4LJXe53nrixItKPOI25r+hIi1AwirnUHbiVQbXm1t05yCrxClKDAs1B9as/jFiiuEYmqbU9z43RT6lBwSSfVfqJ5TFlEzrivYxqqrj10/mtM3KWKUMSRiYrjWSu/p5IqbJ2qoKsU1Ec22UvF//zegmGN34qdJwg12yxKEwkwYjkj5OhMJyhnGaEMiOyWwkbU0MZZvHkIXjLL6+S9kXdu6p7D5e1xm0RRxlO4BTOwYNraMA9NKEFDMbwDK/w5ijnxXl3PhatJaeYOYY/cD5/ABBbjZk=</latexit> <latexit sha1_base64="B/sZzzLOOLr5w71iskFN0glTMwI=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU1GPRi8eK9gPaUDbbSbt0swm7G6GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/VbT6g0j+WjGSfoR3QgecgZNVZ6aPfOe+WKW3VnIMvEy0kFctR75a9uP2ZphNIwQbXueG5i/Iwqw5nASambakwoG9EBdiyVNELtZ7NTJ+TEKn0SxsqWNGSm/p7IaKT1OApsZ0TNUC96U/E/r5Oa8NrPuExSg5LNF4WpICYm079JnytkRowtoUxxeythQ6ooMzadkg3BW3x5mTTPqt5l1bu/qNRu8jiKcATHcAoeXEEN7qAODWAwgGd4hTdHOC/Ou/Mxby04+cwh/IHz+QPf4Y2I</latexit> <latexit sha1_base64="Hat4jeynL6+0HQsxWpViy6lPoBM=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz00OnLfrniVt05yCrxclKBHI1++as3iFkaoTRMUK27npsYP6PKcCZwWuqlGhPKxnSIXUsljVD72fzUKTmzyoCEsbIlDZmrvycyGmk9iQLbGVEz0sveTPzP66YmvPYzLpPUoGSLRWEqiInJ7G8y4AqZERNLKFPc3krYiCrKjE2nZEPwll9eJa2LqlereveXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnnzmGP3A+fwA5XI3D</latexit>

(A)
X1 X2
<latexit sha1_base64="FFh1EjDc75u7fVJjopk3Ut1uZC0=">AAAB63icbVBNS8NAEJ3Ur1q/qh69LBbBU0lE1GPRi8cK9gPaUDbbSbt0dxN2N0IJ/QtePCji1T/kzX9j0uagrQ8GHu/NMDMviAU31nW/ndLa+sbmVnm7srO7t39QPTxqmyjRDFssEpHuBtSg4ApblluB3VgjlYHATjC5y/3OE2rDI/VopzH6ko4UDzmjNpe6A68yqNbcujsHWSVeQWpQoDmofvWHEUskKssENabnubH1U6otZwJnlX5iMKZsQkfYy6iiEo2fzm+dkbNMGZIw0lkpS+bq74mUSmOmMsg6JbVjs+zl4n9eL7HhjZ9yFScWFVssChNBbETyx8mQa2RWTDNCmebZrYSNqabMZvHkIXjLL6+S9kXdu6p7D5e1xm0RRxlO4BTOwYNraMA9NKEFDMbwDK/w5kjnxXl3PhatJaeYOYY/cD5/ABHgjZo=</latexit>

<latexit sha1_base64="LOIzQ7YzatssR+ZiHb0YD8BuVpU=">AAAB63icbVBNS8NAEJ34WetX1aOXxSJ4KkkR9Vj04rGC/YA2lM120i7d3YTdjVBC/4IXD4p49Q9589+YtDlo64OBx3szzMwLYsGNdd1vZ219Y3Nru7RT3t3bPzisHB23TZRohi0WiUh3A2pQcIUty63AbqyRykBgJ5jc5X7nCbXhkXq00xh9SUeKh5xRm0vdQb08qFTdmjsHWSVeQapQoDmofPWHEUskKssENabnubH1U6otZwJn5X5iMKZsQkfYy6iiEo2fzm+dkfNMGZIw0lkpS+bq74mUSmOmMsg6JbVjs+zl4n9eL7HhjZ9yFScWFVssChNBbETyx8mQa2RWTDNCmebZrYSNqabMZvHkIXjLL6+Sdr3mXdW8h8tq47aIowSncAYX4ME1NOAemtACBmN4hld4c6Tz4rw7H4vWNaeYOYE/cD5/ABNljZs=</latexit>

(B) V1 V2 V3

V7 V1 V2 V4

V7 V7
(C)

V4 V5 V5 V6

V4

V7
V7 V7

(D)
V5 V6
V4 V5 V4 V5

V4

Figure 1: (A): In this paper, we unify structure prediction in causal inference and RL in terms of a simplicial object
Xn , n > 0, contravariant functors from ordinal numbers into a category that originated in algebraic topology (May,
1992), but has since become the foundation for higher-order category theory (Boardman and Vogt, 1973; Joyal, 2002;
Lurie, 2022). The internal structure of Xn involves composable morphisms of dimension n. 0th order objects in X0
involve only identity morphisms from an object to itself. 1th order objects in X1 define binary arrows between pairs
of objects. Higher-order objects in Xn , n > 2, induce higher-order relationships including symmetries. Structure
discovery in causal inference and RL is modeled as the extension problems of “filling horns", where the horn Λnk is
a subset of a contravariant functor of an n-simplex that results from removing its interior and the face opposite the
k-vertex. (B): Predictive state horns are partial fragments of potential tests, which can faithfully embed a category of
dynamical systems into a simplicial object, and can be seen as a special case of the nerve of a category that defines a
fully faithful embedding of arbitrary categories in terms of simplicial objects. (C): Causal “inner" horns, fragments of
causal models, define a quasicategory (Joyal, 2002), a simplicial object where composition of oriented n-simplices
involves solving a lifting problem in a commutative diagram under a weak homotopy. (D) “Outer" horns define more
general compositional structures that are analyzed in ∞-category theory (Lurie, 2022).

2
A PREPRINT - S EPTEMBER 15, 2022

a free algebra over paired actions and observation that under certain conditions is guaranteed to faithfully embed a
dynamical system without requiring the need to explicitly construct “belief states", probability distributions over latent
states, as in a partially observable Markov decision process (POMDP). The core tests define a minimal left k-module
over an Abelian group defined as a free algebra over all possible tests, and induce higher-order symmetries.
We formulate structure discovery of causal and PSR models in terms of simplicial objects (see Figure 1 and Figure 2)
which were originally introduced as a combinatorial representation in algebraic topology (May, 1992), but have become
the foundation for higher-order category theory (Boardman and Vogt, 1973; Joyal, 2002; Lurie, 2022). Formally,
simplicial objects May (1992) are contravariant functors X : ∆o → C from the category of ordinal numbers, whose
objects are [n] = (0, 1, . . . , n), n > 0, and whose arrows are non-decreasing maps, into an underlying category, such
as a universal causal model (Mahadevan, 2022a), or a universal decision model (Mahadevan, 2021a). The 0th order
simplices are objects in a universal causal model, or a universal decision model. The 1-simplices define arrows encoding
directional relationships between causal or decision objects. Simplicial objects of higher dimensions n > 2 encode
higher-order relationships between groups of objects.
The general problem of horn filling has been solved in different ways in higher-order category theory, including weak
Kan complexes Boardman and Vogt (1973), quasi-categories (Joyal, 2002), and ∞-categories (Lurie, 2022). We will
review this literature, and then show the connections to structure discovery in causal inference and RL. Structure
prediction in causal inference and RL both can be formulated as filling horns of simplicial objects, subsets of functors
that result from interventions that removes the interior and the face opposite a given vertex. These interventions can be
defined in terms of a sequence of elementary order-preserving morphisms in the category of ordinal numbers ∆, whose
objects are the ordinals [n] = {0, 1, . . . , n} and whose arrows are defined as compositions of elementary injections
di : [n − 1] → [n] and elementary surjections si : [n] → [n − 1] (see Figure 2). In particular, the units in a causal study
or the states of an underlying dynamical system are x, y, . . . ∈ X0 , the 0-simplices of X. If x, y ∈ X0 , and f ∈ X1 ,
we denote a potential causal effect or action in a dynamical system by the arrow f : x → y to denote that x = d1 f
and y = d0 f . Here, di is an elementary injection di : [n − 1] → [n] that skips element i. The set of potential causal
influences or actions between x and y is denoted as X1 (x, y). The degenerate s0 x denotes the unit morphism (a cycle)
mapping 1x : x → x, where si : [n] → [n = 1] is the elementary surjection that repeats element i ∈ [n]. Higher-order
simplices Xi , i > 2 are used to capture higher-order causal effects or action transitions among groups of causal objects
or dynamical system states of size n > 2. Note in Figure 1, there is one arrow out of X0 , denoting the self-mapping
from an object to itself. The two arrows from X1 to X0 represent the head and tail of each morphism in X1 . There are
in general n + 1 arrows in and out of Xn , for n > 0. For example, X2 has three arrows going back into X1 , because
those are the three arrows that are contained in the 2-simplex.

2 Simplicial Objects

Simplicial objects have long been a foundation for algebraic topology (May, 1999, 1992), and more recently higher-
order category theory (Boardman and Vogt, 1973; Joyal, 2002; Lurie, 2022). The category ∆ has non-empty ordinals
[n] = {0, 1, . . . , n] as objects, and order-preserving maps [m] → [n] as arrows. An important property in ∆ is that any
mapping is decomposable as a composition of an injective and a surjective mapping, each of which is decomposable into
a sequence of elementary injections di : [n] → [n + 1], which omits i ∈ [n], and a sequence of elementary surjections
si : [n] → [n − 1], which repeats i ∈ [n]. The fundamental simplex ∆([n]) is the presheaf of all morphisms into [n],
that is, the representable functor ∆(−, [n]). 2
We will denote simplicial objects over the category of sets by X[n] = Xn to mean the functor [n] → X, mapping both
the objects [n] = {0, 1, . . . , n} as well as its morphisms to a set X. Thus, X0 is a set of objects, which are mappings
from [0] = {0} to X, and as [0] contains only one element, each such mapping must pick out a single element of X.
Note that as the only morphism in [0] is 0 → 0, its image gives the identity mapping on each element in the set X.
Similarly, X1 is the functor mapping [1] = {0, 1} and its non-identity morphism 0 → 1 to X, which defines arrows
between elements of X0 . Proceeding, X[2] is the functor mapping [2] = {0, 1, 2}, with its non-identity morphisms
0 → 1, 1 → 2, 0 → 2 into a category, so it picks out “triangles", or oriented simplices of order 2.
The Yoneda Lemma MacLane (1971) assures us that an n-simplex x ∈ Xn can be identified with the corresponding
map ∆[n] → X. Every morphism f : [n] → [m] in ∆ is functorially mapped to the map ∆[m] → ∆[n] in S. Figure 2)
illustrates simplicial objects, in particular showing that any order-preserving morphism can be decomposed into a
sequence of elementary degeneracy and face operators. 3

2
A representable functor is one that can be faithfully embedded in the category of Sets.
3
The Latex source for this diagram in is from http://homepages.math.uic.edu/∼jlv/seminars/infinitycats/inftycats.tex.

3
A PREPRINT - S EPTEMBER 15, 2022

Category ∆: Objects are the ordinal numbers [n] = (0, 1, . . . , n)


Arrows are non-decreasing maps [n] → [m]
Elementary coface maps: si : [n] → [n − 1], repeat i twice in the co-domain
Elementary codegeneracy maps: di : [n] → [n + 1], skips i in the co-domain
Any arrow [m] → [n] is composed of elementary face and degeneracy maps:
s1 d1 d2 d1 s2 d2
• • • • • • • • • •

• • is • • • • or • • • •

• • • • • • • • •

• • • •

[n]: Objects are numbers 0, 1, . . . , n


|Hom[n] (a, b)| = 1 iff a 6 b, else ∅

Simplicial set: sSet: Hom∆ (∆op , Set)


Objects are functors S = {Sn = S([n])}n>0 with
face maps S(si ) : Sn → Sn+1
degeneracy maps S(di ) : Sn → Sn−1
Morphisms f : S → T are natural transformations

Simplicial object: sObj: Hom∆ (∆op , C) for C any category


Figure 2: Simplicial objects are contravariant functors from ordinal numbers into any category.

Any morphism in the category ∆ can be defined as a sequence of degeneracy and face operators, where the degeneracy
operator δi : [n − 1] → [n], 0 6 i 6 n is defined as:

j, for 0 6 j 6 i − 1
δi (j) =
j+1 for i 6 j 6 n − 1

Analogously, the face operator σj : [n + 1] → [n] is defined as



j, for 0 6 k 6 j
σj (k) =
k−1 for j < k 6 n + 1

The compositions of these arrows define certain well-known properties (May, 1992; Richter, 2020):

δj ◦ δi = δi ◦ δj−1 , i < j
σj ◦ σi = σi ◦ σj+1 , i 6 j
σi ◦ σj+1 , for i < j
(
σj ◦ δi (j) = 1[n] for i = j, j + 1
σi−1 ◦ σj , for i > j + 1
Definition 1. The set of all functors [n] → C for any category C is denoted as Nn (C). The image of each abstract face
operator σj : [n + 1] → [n] under the covariant functor Nn (C) is denoted as Nn (σj ) = dj , and the image of each
abstract degeneracy operator δi : [n − 1] → [n] under the covariant functor Nn (C) is given as Nn (δi ) = si .

2.1 Full and Faithful Embedding of Categories

Definition 2. A covariant functor F : C → D from category C to category D, and defined as the following:

• An object F X (sometimes written as F (x)) of the category D for each object X in category C.

4
A PREPRINT - S EPTEMBER 15, 2022

• An arrow F (f ) : F X → F Y in category D for every arrow f : X → Y in category C.


• The preservation of identity and composition: F idX = idF X and (F f )(F g) = F (g ◦ f ) for any composable
arrows f : X → Y, g : Y → Z.
Definition 3. A contravariant functor F : C → D from category C to category D is defined exactly like the covariant
functor, except all the arrows are reversed. In the contravariant functorF : C op → D, every morphism f : X → Y is
assigned the reverse morphism F f : F Y → F X in category D.

• For every object X in a category C, there exists a covariant functor C(X, −) : C → Set that assigns to each
object Z in C the set of morphisms C(X, Z), and to each morphism f : Y → Z, the pushforward mapping
f∗ : C(X, Y ) → C(X, Z).
• For every object X in a category C, there exists a contravariant functor C(−, X) : C op → Set that assigns
to each object Z in C the set of morphisms HomC (X, Z), and to each morphism f : Y → Z, the pullback
mapping f ∗ : HomC (Z, X) → C(Y, X). Note how “contravariance" implies the morphisms in the original
category are reversed through the functorial mapping, whereas in covariance, the morphisms are not flipped.
Definition 4. Let F : C → D be a functor from category C to category D. If for all arrows f the mapping f → F f
• injective, then the functor F is defined to be faithful.
• surjective, then the functor F is defined to be full.
• bijective, then the functor F is defined to be fully faithful.
Definition 5. A pair of adjoint functors is defined as F : C → D and G : D → C, where F is considered the right
adjoint, and G is considered the left adjoint.

G
D > C.
F
Definition 6. The nerve of a category C is the set of composable morphisms of length n, for n > 1. Let Nn (C) denote
the set of sequences of composable morphisms of length n.

f1 f2 fn
{Co −→ C1 −→ . . . −→ Cn | Ci is an object in C, fi is a morphism in C}
The set of n-tuples of composable arrows in C, denoted by Nn (C), can be viewed as a functor from the simplicial object
[n] to C. Note that any nondecreasing map α : [m] → [n] determines a map of sets Nm (C) → Nn (C). The nerve of a
category C is the simplicial set N• : ∆ → Nn (C), which maps the ordinal number object [n] to the set Nn (C).
The importance of the nerve of a category comes from a key result Lurie (2022), showing it defines a full and faithful
embedding of a category:
Theorem 1. (Lurie, 2022, Tag 002Y): The nerve functor N• : Cat → Set is fully faithful. More specifically, there
is a bijection θ defined as:

θ : Cat(C, C 0 ) → Set∆ (N• (C), N• (C 0 )


In general, the functor G from a simplicial object X to a category C can be lossy. For example, we can define the
objects of C to be the elements of X0 , and the morphisms of C as the elements f ∈ X1 , where f : a → b, and d0 f = a,
and d1 f = b, and s0 a, a ∈ X as defining the identity morphisms 1a . Composition in this case can be defined as the
free algebra defined over elements of X1 , subject to the constraints given by elements of X2 . For example, if x ∈ X2 ,
we can impose the requirement that d1 x = d0 x ◦ d2 x. Such a definition of the left adjoint would be quite lossy because
it only preserves the structure of the simplicial object X up to the 2-simplices. The right adjoint from a category to its
associated simplicial object, in contrast, constructs a full and faithful embedding of a category into a simplicial set. In
particular, the nerve of a category is such a right adjoint.
Example 1. Given a category C, and an n-simplex σ of the simplicial set Nn (C), which we can identify with the
sequence:

f1 f2 fn
σ = Co −→ C1 −→ . . . −→ Cn
the face operator d0 applied to σ yields the sequence

5
A PREPRINT - S EPTEMBER 15, 2022

f2 f3 fn
d0 σ = C1 −→ C2 −→ . . . −→ Cn

where the object C0 is “deleted" along with the morphism f0 leaving it.
Example 2. Given a category C, and an n-simplex σ of the simplicial set Nn (C), which we can identify with the
sequence:

f1 f2 fn
σ = Co −→ C1 −→ . . . −→ Cn

the face operator dn applied to σ yields the sequence

f1 f2 fn−1
dn σ = C0 −→ C1 −→ . . . −−−→ Cn−1

where the object Cn is “deleted" along with the morphism fn entering it.
Example 3. Given a category C, and an n-simplex σ of the simplicial set Nn (C), which we can identify with the
sequence:

f1 f2 fn
σ = Co −→ C1 −→ . . . −→ Cn

the face operator di , 0 < i < n applied to σ yields the sequence

f1 f2 fi+1 ◦fi fn
di σ = C0 −→ C1 −→ . . . Ci−1 −−−−−→ Ci+1 . . . −→ Cn

where the object Ci is “deleted" and the morphisms fi is composed with morphism fi+1 .
Example 4. Given a category C, and an n-simplex σ of the simplicial set Nn (C), which we can identify with the
sequence:

f1 f2 fn
σ = Co −→ C1 −→ . . . −→ Cn

the degeneracy operator si , 0 6 i 6 n applied to σ yields the sequence

f1 f2 i
1C fi+1 fn
si σ = C0 −→ C1 −→ . . . Ci −−→ Ci −−−→ Ci+1 . . . −→ Cn

where the object Ci is “repeated" by inserting its identity morphism 1Ci .


Definition 7. Given a category C, and an n-simplex σ of the simplicial set Nn (C), which we can identify with the
sequence:

f1 f2 fn
σ = Co −→ C1 −→ . . . −→ Cn

we define σ as a degenerate simplex if some fi in the above sequence is an identity morphism, in which case Ci and
Ci+1 are equal.

3 Horns of Simplicial Objects

One of the fundamental contributions of this paper is to show that structure discovery in causal inference and RL
both involve solving extensions problems defined by commutative diagrams, which are defined as “horn filling" in
higher-order category theory (Boardman and Vogt, 1973; Joyal, 2002; Lurie, 2022).

6
A PREPRINT - S EPTEMBER 15, 2022

3.1 Simplicial Subsets and Horns

We now describe ways of modeling subsets of simplicial objects, which are simplicial objects in their own right. We
introduce the crucial concept of the boundary and horn of a simplicial set.
Definition 8. The standard simplex ∆n is the simplicial set defined by the construction

([m] ∈ ∆) 7→ Hom∆ ([m], [n])


By convention, ∆−1 := ∅. The standard 0-simplex ∆0 maps each [n] ∈ ∆op to the single element set {•}.
Definition 9. Let S• denote a simplicial set. If for every integer n > 0, we are given a subset Tn ⊆ Sn , such that the
face and degeneracy maps

di : Sn → Sn−1 si : Sn → Sn+1
applied to Tn result in

di : Tn → Tn−1 si : Tn → Tn+1
then the collection {Tn }n>0 defines a simplicial subset T• ⊆ S•
Definition 10. The boundary is a simplicial set (∂∆n ) : ∆op → Set defined as

(∂∆n )([m]) = {α ∈ Hom∆ ([m], [n]) : α is not surjective}


Note that the boundary ∂∆n is a simplicial subset of the standard n-simplex ∆n .
Definition 11. The Horn Λni : ∆op → Set is defined as

(Λni )([m]) = {α ∈ Hom∆ ([m], [n]) : [n] 6⊆ α([m]) ∪ {i}}


Intuitively, the Horn Λni can be viewed as the simplicial subset that results from removing the interior of the n-simplex
∆n together with the face opposite its ith vertex. Lifting problems provide elegant ways to define basic notions in
a wide variety of areas in mathematics. For example, the notion of injective and surjective functions, the notion of
separation in topology, and many other basic constructs can be formulated as solutions to lifting problems.

Let us illustrate this abstract definition with the following diagrams. Consider the problem of composing 1-dimensional
simplices to form a 2-dimensional simplicial object. Each simplicial subset of an n-simplex induces a a horn Λnk , where
0 6 k 6 n. Intuitively, a horn is a subset of a simplicial object that results from removing the interior of the n-simplex
and the face opposite the ith vertex. Consider the three horns defined below. The dashed arrow 99K indicates edges of
the 2-simplex ∆2 not contained in the horns.

{0} {0} {0}

{1} {2} {1} {2} {1} {2}

The inner horn Λ21 is the middle diagram above, and admits an easy solution to the “horn filling" problem of composing
the simplicial subsets. The two outer horns on either end pose a more difficult challenge. A considerable elaboration
of the theoretical machinery in category theory is required to describe the various solutions proposed, which led to
different ways of defining higher-order category theory (Boardman and Vogt, 1973; Joyal, 2002; Lurie, 2022), which
we summarize below.
Definition 12. Let C be a category. A lifting problem in C is a commutative diagram σ in C.
µ
A X
f p
ν
B Y

7
A PREPRINT - S EPTEMBER 15, 2022

Definition 13. Let C be a category. A solution to a lifting problem in C is a morphism h : B → X in C satisfying


p ◦ h = ν and h ◦ f = µ as indicated in the diagram below.
µ
A X
f h p
ν
B Y
Definition 14. Let C be a category. If we are given two morphisms f : A → B and p : X → Y in C, we say that f has
the left lifting property with respect to p, or that p has the right lifting property with respect to f if for every pair of
morphisms µ : A → X and ν : B → Y satisfying the equations p ◦ µ = ν ◦ f , the associated lifting problem indicated
in the diagram below.
µ
A X
f h p
ν
B Y

admits a solution given by the map h : B → X satisfying p ◦ h = ν and h ◦ f = µ.


Example 5. Given the paradigmatic non-surjective morphism f : ∅ → {•}, any morphism p that has the right lifting
property with respect to f is a surjective mapping.
µ
∅ X
h
f p

ν
{•} Y

Example 6. Given the paradigmatic non-injective morphism f : {•, •} → {•}, any morphism p that has the right
lifting property with respect to f is an injective mapping.
µ
{•, •} X
h p
f
ν
{•} Y
Definition 15. If S is a collection of morphisms in category C, a morphism f : A → B has the left lifting property
with respect to S if it has the left lifting property with respect to every morphism in S. Analogously, we say a morphism
p : X → Y has the right lifting property with respect to S if it has the right lifting property with respect to every
morphism in S.
Definition 16. Let C and C 0 be a pair of objects in a category C. We say C is a retract of C 0 if there exists maps
i : C → C 0 and r : C 0 → C such that r ◦ i = idC .
Definition 17. Let C be a category. We say a morphism f : C → D is a retract of another morphism f 0 : C → D if
it is a retract of f 0 when viewed as an object of the functor category Hom([1], C). A collection of morphisms T of C is
closed under retracts if for every pair of morphisms f, f 0 of C, if f is a retract of f 0 , and f 0 is in T , then f is also in T .

3.2 Fibrations and Kan Complexes

Definition 18. Let f : X → S be a morphism of simplicial sets. We say f is a Kan fibration if, for each n > 0, and
each 0 6 i 6 n, every lifting problem
σ0
Λni X
σ
f

σ̄
∆n S

admits a solution. More precisely, for every map of simplicial sets σ0 : Λni → X and every n-simplex σ̄ : ∆n → S
extending f ◦ σ0 , we can extend σ0 to an n-simplex σ : ∆n → X satisfying f ◦ σ = σ̄.

8
A PREPRINT - S EPTEMBER 15, 2022

Example 7. Given a simplicial set X, then a projection map X → ∆0 that is a Kan fibration is called a Kan complex.
Example 8. Any isomorphism between simplicial sets is a Kan fibration.
Example 9. The collection of Kan fibrations is closed under retracts.
Definition 19. Let X and Y be simplicial sets, and suppose we are given a pair of morphisms f0 , f1 : X → Y . A
homotopy from f0 to f1 is a morphism h : ∆1 × X → Y satisfying f0 = h|0×X and f1 = h1×X .

3.3 Higher-order Categories

We now formally introduce higher-order categories, building on the framework proposed in a number of formalisms
(Boardman and Vogt, 1973; Joyal, 2002; Lurie, 2022).
Definition 20. (Lurie, 2022) An ∞-category is a simplicial object S• which satisfies the following condition:

• For 0 < i < n, every map of simplicial sets σ0 : Λni → S• can be extended to a map σ : ∆n → S• .

This definition emerges out of a common generalization of two other conditions on a simplicial set Si :

1. Property K: For n > 0 and 0 6 i 6 n, every map of simplicial sets σ0 : Λni → S• can be extended to a map
σ : ∆n → S• .
2. Property C: for 0 < 1 < n, every map of simplicial sets σ0 : Λni → Si can be extended uniquely to a map
σ : ∆n → S• .

Simplicial objects that satisfy property K were defined above to be Kan complexes. Simplicial objects that satisfy
property C above can be identified with the nerve of a category, which yields a full and faithful embedding of a category
in the category of sets. Definition 20 generalizes both of these definitions, and was called a quasicategory in (Joyal,
2002) and weak Kan complexes in (Boardman and Vogt, 1973) when C is a category.

4 Universal Reinforcement Learning


In this section, we briefly review the framework of universal decision models (Mahadevan, 2021a), where a decision
making object is defined as a particular type of category. We will focus in particular on predictive state representations
(PSRs) Singh et al. (2004), but we will include a brief review of Markov decision processes (MDPs), which are a
stochastic dynamical system widely used in the literature on reinforcement learning. We show that predictive state
representations define a simplicial object representation of a category, and prove the Universal PSR theorem based on
the fact that the nerve of a PSR defines a full and faithful embedding of the category of PSRs in the category of sets. We
show that simplicial object representations of PSRs define a quasicategory.

4.1 UDMs based on Markov Decision Processes

We now briefly describe the category of UDMs, where each object represents a (finite) Markov decision process (MDP)
(Puterman, 1994). Recall that an MDP is defined by a tuple hS, A, Ψ, P, Ri, where S is a discrete set of states, A is
the discrete set of actions, Ψ ⊂ S × A is the set of admissible state-action pairs, P : Ψ × S → [0, 1] is the transition
probability function specifying the one-step dynamics of the model, where P (s, a, s0 ) is the transition probability of
moving from state s to state s0 in one step under action a, and R : Ψ → R is the expected reward function, where
R(s, a) is the expected reward for executing action a in state s. MDP homomorphisms can be viewed as a principled
way of abstracting the state (action) set of an MDP into a “simpler" MDP that nonetheless preserves some important
properties, usually referred to as the stochastic substitution property (SSP).
Definition 21. A UDM MDP homomorphism (Ravindran and Barto, 2003) from object M = hS, A, Ψ, P, Ri to
M 0 = hS 0 , A0 , Ψ0 , P 0 , R0 i, denoted h : M  M 0 , is defined by a tuple of surjections hf, {gs |s ∈ S}i, where
f : S  S 0 , gs : As  A0f (s) , where h((s, a)) = hf (s), gs (a)i, for s ∈ S, such that the stochastic substitution
property and reward respecting properties below are respected:
X
P 0 (f (s), gs (a), f (s0 )) = P (s, a, s”) (1)
s”∈[s0 ]f

R0 (f (s), gs (a)) = R(s, a) (2)

Given this definition, the following result is straightforward to prove.

9
A PREPRINT - S EPTEMBER 15, 2022

Symmetric monodical category encoding the conditional probabilities P(t |h) of tests given histories

Pn
<latexit sha1_base64="lkulSuG86iUWjdpF9g5YDw13/0s=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseiF48V7Qe0oWy2m3bpZhN2J0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xEnC/YgOlQgFo2ilh0Zf9csVt+rOQVaJl5MK5Gj0y1+9QczSiCtkkhrT9dwE/YxqFEzyaamXGp5QNqZD3rVU0YgbP5ufOiVnVhmQMNa2FJK5+nsio5ExkyiwnRHFkVn2ZuJ/XjfF8NrPhEpS5IotFoWpJBiT2d9kIDRnKCeWUKaFvZWwEdWUoU2nZEPwll9eJa2LqlereveXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OZI58V5dz4WrQUnnzmGP3A+fwAtLI27</latexit>

P0 P1 P2
<latexit sha1_base64="JQIMpGYlaaRGwV3zDTl3mRSaYa4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz00Oi7/XLFrbpzkFXi5aQCORr98ldvELM0QmmYoFp3PTcxfkaV4UzgtNRLNSaUjekQu5ZKGqH2s/mpU3JmlQEJY2VLGjJXf09kNNJ6EgW2M6JmpJe9mfif101NeO1nXCapQckWi8JUEBOT2d9kwBUyIyaWUKa4vZWwEVWUGZtOyYbgLb+8SloXVa9W9e4vK/WbPI4inMApnIMHV1CHO2hAExgM4Rle4c0Rzovz7nwsWgtOPnMMf+B8/gDPJY19</latexit> <latexit sha1_base64="pIpjkg2t50QhgcsWhTVyqrZZaow=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseiF48V7Qe0oWy2m3bpZhN2J0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xEnC/YgOlQgFo2ilh0bf65crbtWdg6wSLycVyNHol796g5ilEVfIJDWm67kJ+hnVKJjk01IvNTyhbEyHvGupohE3fjY/dUrOrDIgYaxtKSRz9fdERiNjJlFgOyOKI7PszcT/vG6K4bWfCZWkyBVbLApTSTAms7/JQGjOUE4soUwLeythI6opQ5tOyYbgLb+8SloXVa9W9e4vK/WbPI4inMApnIMHV1CHO2hAExgM4Rle4c2Rzovz7nwsWgtOPnMMf+B8/gDQqY1+</latexit> <latexit sha1_base64="kipsAyE2q2X/seREz/WXdWC+Y9s=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKqMeiF48V7Qe0oWy2m3bpZhN2J0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xEnC/YgOlQgFo2ilh0a/1i9X3Ko7B1klXk4qkKPRL3/1BjFLI66QSWpM13MT9DOqUTDJp6VeanhC2ZgOeddSRSNu/Gx+6pScWWVAwljbUkjm6u+JjEbGTKLAdkYUR2bZm4n/ed0Uw2s/EypJkSu2WBSmkmBMZn+TgdCcoZxYQpkW9lbCRlRThjadkg3BW355lbRqVe+y6t1fVOo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnRfn3flYtBacfOYY/sD5/AHSLY1/</latexit>

T0 T2 Tn
<latexit sha1_base64="L26oeDWRAR1o5gom1DmQPUdOtto=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE1GPRi8eK/YI2lM120i7dbMLuRiihP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEsG1cd1vp7C2vrG5Vdwu7ezu7R+UD49aOk4VwyaLRaw6AdUouMSm4UZgJ1FIo0BgOxjfzfz2EyrNY9kwkwT9iA4lDzmjxkqPjb7bL1fcqjsHWSVeTiqQo94vf/UGMUsjlIYJqnXXcxPjZ1QZzgROS71UY0LZmA6xa6mkEWo/m586JWdWGZAwVrakIXP190RGI60nUWA7I2pGetmbif953dSEN37GZZIalGyxKEwFMTGZ/U0GXCEzYmIJZYrbWwkbUUWZsemUbAje8surpHVR9a6q3sNlpXabx1GEEziFc/DgGmpwD3VoAoMhPMMrvDnCeXHenY9Fa8HJZ47hD5zPH9U9jYE=</latexit> <latexit sha1_base64="TKMj3TmiUxWOm6FwGTp0fMbQ1gE=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkR9Vj04rFiv6ANZbPdtEs3m7A7EUroT/DiQRGv/iJv/hu3bQ7a+mDg8d4MM/OCRAqDrvvtrK1vbG5tF3aKu3v7B4elo+OWiVPNeJPFMtadgBouheJNFCh5J9GcRoHk7WB8N/PbT1wbEasGThLuR3SoRCgYRSs9NvrVfqnsVtw5yCrxclKGHPV+6as3iFkacYVMUmO6npugn1GNgkk+LfZSwxPKxnTIu5YqGnHjZ/NTp+TcKgMSxtqWQjJXf09kNDJmEgW2M6I4MsveTPzP66YY3viZUEmKXLHFojCVBGMy+5sMhOYM5cQSyrSwtxI2opoytOkUbQje8surpFWteFcV7+GyXLvN4yjAKZzBBXhwDTW4hzo0gcEQnuEV3hzpvDjvzseidc3JZ07gD5zPH9hFjYM=</latexit> <latexit sha1_base64="a19AaVNx54buLk1TK6R+iieyy3c=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE1GPRi8eK/YI2lM120i7dbMLuRiihP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEsG1cd1vp7C2vrG5Vdwu7ezu7R+UD49aOk4VwyaLRaw6AdUouMSm4UZgJ1FIo0BgOxjfzfz2EyrNY9kwkwT9iA4lDzmjxkqPjb7slytu1Z2DrBIvJxXIUe+Xv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzU+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCGz/jMkkNSrZYFKaCmJjM/iYDrpAZMbGEMsXtrYSNqKLM2HRKNgRv+eVV0rqoeldV7+GyUrvN4yjCCZzCOXhwDTW4hzo0gcEQnuEV3hzhvDjvzseiteDkM8fwB87nDzNEjb8=</latexit>

T1
<latexit sha1_base64="06Y54uN/xaSQ0QSFO9PzRibuRL0=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE1GPRi8eK/YI2lM120i7dbMLuRiihP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEsG1cd1vp7C2vrG5Vdwu7ezu7R+UD49aOk4VwyaLRaw6AdUouMSm4UZgJ1FIo0BgOxjfzfz2EyrNY9kwkwT9iA4lDzmjxkqPjb7XL1fcqjsHWSVeTiqQo94vf/UGMUsjlIYJqnXXcxPjZ1QZzgROS71UY0LZmA6xa6mkEWo/m586JWdWGZAwVrakIXP190RGI60nUWA7I2pGetmbif953dSEN37GZZIalGyxKEwFMTGZ/U0GXCEzYmIJZYrbWwkbUUWZsemUbAje8surpHVR9a6q3sNlpXabx1GEEziFc/DgGmpwD3VoAoMhPMMrvDnCeXHenY9Fa8HJZ47hD5zPH9bBjYI=</latexit>

Sequences of action-observation pairs of length n

Figure 3: Left: A predictive state representation (PSR) can be defined in terms of its underlying systems dynamics
matrix (Singh et al., 2004). The columns of this (infinite) matrix are tests t = a1 o1 . . . an on and the rows are histories
h = a1 o1 . . . ak ok . Right: the simplicial object encoded by a PSR. Tn represents the n-simplicial object of tests
or histories, and Pn represents a symmetric monoidal category Fong and Spivak (2018) that can be used to encode
conditional probabilities P (t|h) of each test t given its history h. Note that as in Figure 2, there are n arrows leading
from the n-simplex Tn back to the n − 1 simplex Tn−1 as well as to the n + 1-simplex Tn+1 , which are not shown for
clarity. The conditional probability for each test t as a function of history h, P (t|h) can be defined using an enriched
symmetric monoidal category, exactly as defined in (Bradley et al., 2022), which defines sentence completions in
English by deep NLP networks using conditional probabilities as enriched categories.

Theorem 2. The UDM category CMDP is defined as one where each object c is defined by an MDP, and morphisms
are given by MDP homomorphisms defined by Equation 1.

Proof: Note that the composition of two MDP homomorphisms h : M1 → M2 and h0 : M2 → M3 is once again an
MDP homomorphism h0 ◦ h : M1 → M3 . The identity homomorphism is easy to define, and MDP homomorphisms,
being surjective mappings, obey associative properties.

4.2 UDM Category of Predictive State Representations

We now define the UDM (sub)category CPSR of predictive state representations (Thon and Jaeger, 2015), based on
the notion of homomorphism defined for PSRs proposed in (Soni and Singh, 2007). Recall that a PSR is (in the
simplest case) a discrete controlled dynamical system, characterized by a finite set of actions A, and observations
O. At each clock tick t, the agent takes an action at and receives an observation ot ∈ O. A history is defined as a
sequence of actions and observations h = a1 o1 . . . ak ok . A test is a possible sequence of future actions and observations
t = a1 o1 . . . an on . A test is successful if the observations o1 . . . on are observed in that order, upon execution of actions
a1 . . . an . The probability P (t|h) is a prediction of that a test t will succeed from history h.
A state ψ in a PSR is a vector of predictions of a suite of core tests {q1 , . . . , qk }. The prediction vector ψh =
hP (q1 |h) . . . P (qk |h)i is a sufficient statistic, in that it can be used to make predictions for any test. More precisely, for
every test t, there is a 1 × k projection vector mt such that P (t|h) = ψh .mt for all histories h. The entire predictive
state of a PSR can be denoted Ψ.
Definition 22. In the UDM category CPSR defined by PSR objects, the morphism from object Ψ to another Ψ0 is
defined by a tuple of surjections hf, vψ (a)i, where f : Ψ → Ψ0 and vψ : A → A0 for all prediction vectors ψ ∈ Ψ such
that
P (ψ 0 |f (ψ), vψ (a)) = P (f −1 (ψ 0 )|ψ, a) (3)
for all ψ 0 ∈ Ψ, ψ ∈ Ψ, a ∈ A.
Theorem 3. The UDM category CPSR is defined by making each object c represent a PSR, where the morphisms
between two PSRs h : c → d is defined by the PSR homomorphism defined in (Soni and Singh, 2007).

Proof: Once again, given the homomorphism definition in Definition 22, the UDM category PPSR is easy to define,
given the surjectivity of the associated mappings f and vψ .

10
A PREPRINT - S EPTEMBER 15, 2022

4.3 PSR Test Discovery as Horn Filling of a Simplicial Object

The extensive literature on PSRs has explored ways of constructing the set of core tests needed to serve as a sufficient
statistic for a full and faithful embedding of any dynamical system that can be represented as a PSR. Here, we just want
to illustrate that the process of elaborating a partial test t = a1 o1 . . . ai oi by adding a new candidate action observation
pair ai+1 oi+1 can be viewed as filling the horn of a simplicial object.
Definition 23. A predictive horn of a PSR Λni : ∆op → CP SR is defined as as the simplicial subset of the functor
mapping [n] to the category CP SR defining a PSR model. Recall the horn removes the interior of an n-simplex ∆n
together with the face opposite its ith vertex.

Let us illustrate this notion of horn filling using the same diagram as shown earlier, but this time, interpreting the
morphisms shown below as tests in a PSR.

{0} {0} {0}

{1} {2} {1} {2} {1} {2}

The inner horn Λ21 is the middle diagram above, and admits an easy solution to the “horn filling" problem of composing
the simplicial subsets. The two outer horns on either end pose a more difficult challenge. Stated in the language of
PSRs, the inner horn filling problem represents the problem of composing two tests, defined as images of abstract tests
t01 : 0 → 1 and t12 : 1 → 2 into the category defined by a PSR. Each image of an abstract test leads to an actual
test in a PSR model, for example F (t01 ) = ai oi and F (t12 ) = aj oj . These two tests can be composed to form a test
F (t01 ◦ t12 ) = F (t02 ) = ai oi aj oj . The outer horn filling problem, in contrast, involves inducing a missing test that
“fills in" between a test of length 2 and a test of length 1.
First, let us define more precisely how to construct the associated simplicial object from a PSR. We can define this
construction following the systems dynamics matrix in Figure 3. Both tests and histories are viewed as n-simplices,
constructed from sequences of action-observation pairs. For example, in Figure 3, the 1-simplex encodes a unit test or
history ai oj of a single action observation pair. The conditional probabilities P (t|h) of each test t given its history h is
defined using an enriched symmetric monoidal category P . Basically, we are treating the process of combining a test
t with a history h as a morphism from h to t ◦ h. The set of all morphisms HomC (h, th) is defined as an enriched
symmetric monoidal category that can encode conditional probabilities. (Bradley et al., 2022) gives a detailed treatment
of how to model conditional probabilities for the problem of defining the compositional structure of deep NLP models,
where for each fragment of an English sentence, like h = “I went to the grocery store...", each possible completion,
such as t = “to get some milk" is defined as a conditional probability P (t|h). This problem is exactly the same as
defining the conditional probability of PSR tests, and we refer the reader to (Bradley et al., 2022) for additional details.
Definition 24. The nerve of a PSR C is the set of sequences of composable tests of length n, for n > 1. Let Nn (C)
denote the set of sequences of composable tests

f1 f2 fn
{Co −→ C1 −→ . . . −→ Cn | Ci is an object in C, fi is a morphism in C}

G
NP SR (C) > CP SR .
F

We can use this result to abstractly define a universal PSR theorem, which states that any PSR can be defined completely
by its nerve.
Theorem 4. Universal PSR Theorem: The nerve functor defined by a PSR N• : CatP SR → Set is fully faithful.
More specifically, there is a bijection θ defined as:

θ : Cat(CP SR , C 0 P SR ) → Set∆ (N• (CP SR ), N• (C 0 P SR )


Theorem 5. PSRs form a quasicategory: The nerve functor defined by a PSR N• : CatP SR → Set forms a
quasicategory Joyal (2002).

Proof: The proof of both theorems follows directly from more basic results establishing that the nerve of a category is a
fully faithful embedding of it. Since we showed above in Theorem 3 that PSRs form a category, the nerve embedding

11
A PREPRINT - S EPTEMBER 15, 2022

is fully faithful. It also follows from (Joyal, 2002) that the nerve functor defined by the category of PSRs defines a
quasicategory. •
In particular, the nerve of a PSR defines a quasicategory, and this result shows that its nerve is a full and faithful
embedding of a category as a simplicial object. The significance of this theorem is it shows that a considerable amount
of theoretical machinery in higher-order category theory can be brought to bear on the problem of structure discovery of
a PSR. There are many elaborations of this basic result, which we will postpone to a subsequent paper. The definition
of nerve leads to an important topological characterization of any category.
Definition 25. The classifying space of a PSR C is the topological space defined by its nerve functor |N• (C)|.

The classifying space of a category gives a way to define an algebraic invariant of a PSR, which we define below as its
singular homology.

4.4 Singular Homology of a PSR

We will now describe the singular homology of a PSR. First, we need to define more concretely the topological
n-simplex that provides a concrete way to attach a topology to a simplicial object. Our definitions below build on those
given in (Lurie, 2022). For each integer n, define the topological space |∆n | realized by the object ∆n as

|∆n | = {t0 , t1 , . . . , tn ∈ Rn+1 : t0 + t1 + . . . + tn = 1}

This is the familiar n-dimensional simplex over n variables. For any PSR model, its classifying space |N• (C)| defines a
topological space. We can now define the singular n-simplex is a continuous mapping σ : |∆N | → |N• (C)|. Every
singular n-simplex σ induces a collection of n − 1-dimensional simplices called faces, denoted as

di σ(t0 , . . . , tn−1 ) = (t0 , t1 , . . . , ti−1 , 0, ti , . . . , tn−1 )

Define the set of all morphisms Singn (X) = HomTop (∆n , |N• (C)|) as the set of singular n-simplices of |N• (C)|.
Definition 26. For any topological space defined by a PSR |N• (C)|, the singular homology groups H∗ (|N• (C)|; Z)
are defined as the homology groups of a chain complex

∂ ∂ ∂
... −
→ Z(Sing2 (|N• (C)|)) −
→ Z(Sing1 (|N• (C)|)) −
→ Z(Sing0 (|N• (C)|))

where Z(Singn (|N• (C)|)) denotes the free Abelian group generated by the set Singn (|N• (C)|) and the differential ∂ is
defined on the generators by the formula

n
X
∂(σ) = (−1)i di σ
i=0

Intuitively, a chain complex builds a sequence of vector spaces that can be used to construct an algebraic invariant
of a PSR from its classifying space by choosing the left k module Z to be a vector space. Each differential ∂ then
becomes a linear transformation whose representation is constructed by modeling its effect on the basis elements in
each Z(Singn (X)).

5 Universal Causal Model


We now state more formally the Universal Causal Model (UCM) framework (Mahadevan, 2022a), which provides some
background to simplicial objects formulation below.
Definition 27. A universal causal model (UCM) is defined as a tuple hC, X , I, O, Ei where each of the components
is specified as follows:

1. C is a category of causal objects that interact with each other, and their patterns of interactions are captured by
a set of morphisms HomC (X, Y ) between object X and Y . Categories where the Hom morphisms can be
defined as a set are referred to as locally small categories, which are the ones of primary interest to us in this
paper. In particular, a V-enriched category C is one where the Hom morphisms are defined over the category
V, which allows capturing additional structure that might exist in the set of morphisms.

12
A PREPRINT - S EPTEMBER 15, 2022

Exposure
Activity Covid Test

1 Ballgame Positive
Walking
Walking Negative
Eating Out
Hiking

Figure 4: A simple category C for studying causal effects of exposure to getting Covid-19 infections. The morphism
Exposure maps a finite set object Activity to the (Boolean) object Covid Test, where the morphism could represent
any arbitrary measure-theoretic Witsenhausen (1975); Heymann et al. (2021), probabilistic Pearl (2009b), or topological
relationship Mahadevan (2021c). In category theory, an “element" of a set (such as Walking) is formally denoted by a
morphism from the unit object 1 (which has one object and one identity morphism) to the set that is defined by the
label it maps to. Causal interventions are viewed as morphisms into an object, such as determining the causal effect
of Activity on Covid Test by intervening on the Activity variable by setting it to Walking. In this paper, we model
interventions as elementary face and degeneracy mappings on simplicial objects.

2. X is a set of construction objects that allow constructing complex causal models from elementary parts.
Examples include monoidal tensor product of two categories M ⊗ M, Galois extensions and profunctors
between two partially ordered sets to represent resource models, co-limits and decorated cospans to represent
electric circuits, and so on. In this paper, we use the construction tools from the theory of simplicial objects.
3. I is an intervention category of intervention objects, which map from some category E of experimental designs
into the causal category C to implement an experimental design. Interventions can be simple, such as setting
the value of a variable to a specific value (see Figure 4), in which case I is a comma category, or they can
be more complex, such as setting the value of a variable to some arbitrary marginal distribution, such as the
edge-intervention model proposed by Janzing et al. (2013). There is a large literature on treatment planning in
causal inference, and many of the state of the art applications of causal inference, such as bipartite experiments
Zigler and Papadogeorgou (2018) use sophisticated experimental designs. Any of these experimental designs
can be accommodated in our functorial intervention framework. In this paper, we use the elementary face and
degeneracy operators defined earlier on simplicial objects.
4. O is a functor category of observation objects, which provide a (partial, perhaps noisy) view of causal objects in
C. For example, an object in the covariant functor category of presheafs HomC (X, −) of morphisms out of X
can be viewed as a measurement of the “state" of X. More general examples include bisimulation morphisms
used in the literature in reinforcement learning and software systems Mahadevan (2021a). Adam and Dahleh
(2019) propose modeling observation as an order-preserving function Φ from a system S modeled by a lattice
to an observation structure O, which is also a lattice. They define the composition of subsystems into a larger
system as a join S1 ∨ S2 , where a system exhibits a generative effect if Φ(S1 ∨ S2 ) 6= Φ(S1 ) ∨ Φ(S2 ).
5. E is a functor category of evaluation objects, which map a given causal model into an evaluation category
for the purposes of evaluating the effects of an intervention. For an example, for a network economy
Nagurney (1999), the evaluation functor E is defined as a mapping from the causal network economy model
to the category of real-valued vector spaces defined by the vector field F of a variational inequality (VI)
hF (x∗ ), (x − x∗ )i > 0, ∀x ∈. Here, the vector field F : C → Rn is a functor from the causal category C of
network economy models (see Mahadevan (2021a)) to the category of n-dimensional Euclidean space. Solving
a VI means finding equilibrium flows x∗ on the network that represent stable patterns of trade. For example,
due to the recent war in Ukraine, global supply chains from grain to natural gas and oil have been disrupted,
and this intervention requires the global economic system to find a new equilibrium. The problem of studying
causal interventions in network economies was recently studied by us in a previous paper Mahadevan (2021b),
and we refer the reader to that paper for more details. In a causal DAG model, the evaluation functor maps the
causal category into the category of probability spaces (i.e., the rules of Pearl’s do-calculus Pearl (2009b) give
conditions under which P (Y |do(X)) = P (Y |X)).

13
A PREPRINT - S EPTEMBER 15, 2022

5.1 Causal Inference over Simplicial Objects

We now turn to formulating causal discovery in terms of adjoint functors between a category, defining a universal
causal model (Mahadevan, 2022a), and a simplicial object. We define causal interventions as images of the elementary
injective maps di : [n] → [n + 1], which skips element i in its image, and surjective maps, sj : [n] → [n − 1],
which repeats element i, in the category of ordinal numbers. Defining causal inference over simplicial objects requires
choosing a compositional structure over higher-order simplicial objects n > 2, which involves requires a significant
amount of higher-order category theory, including weak Kan complexes Boardman and Vogt (1973), quasicategories
(Joyal, 2002) and ∞-categories (Lurie, 2022), which were reviewed above. We pose the problem of causal discovery
in simplicial objects in terms of adjoints between categories and simplicial objects. The nerve of a category is a fully
faithful right adjoint from a category C to an associated simplicial object NC . Its left adjoint is a destructive functor that
only preserves information up to the 2-simplices. We relate these ideas to existing literature on causal discovery from
conditional independence oracles on datasets.
Many previous studies of causal discovery from interventions, including the conservative family of intervention targets
(Hauser and Bühlmann, 2012), path queries (Bello and Honorio, 2018), and separating systems of finite sets or graphs.
(Eberhardt, 2008; Hauser and Bühlmann, 2012; Kocaoglu et al., 2017; Katona, 1966; Mao-cheng, 1984) can all be
viewed as imposing a topology on the sets of variables being intervened.

5.2 Causal Horns and Conditional Independence

We now introduce the concept of causal horns, subsets of simplicial objects, which like the case of PSRs above,
represent fragments of the complete simplicial object that is constructed during the process of structure discovery. In the
literature on causal discovery, it is common to use conditional independence as a guide to causal discovery. Conditional
independence is a symmetric property, as we will see below from examining a well-known axiom system for conditional
independence in statistics.
Definition 28. A Causal Horn Λni : ∆op → C is defined as

(Λni )([m]) = {α ∈ Hom∆ ([m], [n]) : [n] 6⊆ α([m]) ∪ {i}}

Intuitively, a causal horn Λni is a fragment of a causal model (see Figure 6) that can be viewed as the simplicial subset
that results from removing the interior of the n-simplex ∆n together with the face opposite its ith vertex. Note that Cn
defines the n-simplex associated with the horn of a causal model defined by the category C (recall that a simplicial
object is a contravariant functor from the category of ordinal numbers ∆ to the category C defining causal models.).
Definition 29. A separoid Dawid (2001) defines a category over preordered set (S, 6), namely 6 is reflexive and
transitive, equipped with a ternary relation ⊥⊥ on triples (x, y, z), where x, y, z ∈ S satisfy the following properties:

• S1: (S, 6) is a join semi-lattice.


• P1: x ⊥
⊥y|x
• P2: x ⊥
⊥y|z ⇒ y⊥
⊥ x |z
• P3: x ⊥
⊥ y | z and w 6 y ⇒ x⊥
⊥ w |z
• P4: x ⊥
⊥ y | z and w 6 y ⇒ x⊥
⊥ y | (z ∨ w)
• P5: x ⊥
⊥ y | z and x ⊥
⊥ w | (y ∨ z) ⇒ x⊥
⊥ (y ∨ w) | z

A strong separoid also defines a category. A strong separoid is defined over a lattice S has in addition to a join ∨, a
meet ∧ operation, and satisfies an additional axiom:

• P6: If z 6 y and w 6 y, then x ⊥


⊥ y | z and x ⊥
⊥y|w ⇒ x⊥
⊥y|z ∧ w

It is well known that causal DAG structures can only be recovered up to an equivalence class under observation.
For example, the three DAG models A → B → C, A ← B ← C, and the diverger A ← B → C cannot be
discriminated from observations alone, because they define the same conditional independence property A ⊥ ⊥ C|B.
Parameterizing these models as probability distributions implies that Bayes rule allows inferring P (B|A) from P (A|B),
thus these models are equivalent from purely observational data. To discover the exact structure requires making
interventions (e.g., intervening on a variable is usually done by clamping its value X = x, and thereby deleting any
arrows entering the variable from other objects). The same problem applies as well to non-graphical representations

14
A PREPRINT - S EPTEMBER 15, 2022

of conditional independence, such as integer-valued multisets (Studeny, 2010), defined as an integer-valued multiset
function u : ZP(Z) → Z from the power set of integers, P(Z) to integers Z. An imset is defined over partialy ordered set
(poset), defined as a distributive lattice of disjoint (or non-disjoint) subsets of variables. The bottom element is denoted
∅, and top element represents the complete set of variables N . A full discussion of the probabilistic representations
induced by imsets is given (Studeny, 2010). We will only focus on the aspects of imsets that relate to its conditional
independence structure, and its topological structure as defined by the poset. A combinatorial imset is defined as:
X
u= cA δA
A⊂N

where cA is an integer, δA is the characteristic function for subset A, and A potentially ranges over all subsets of N .
An elementary imset is defined over (a, b ⊥ ⊥ A), where a, b are singletons, and A ⊂ N \ {a, b}. A structural imset
is defined as one where the coefficients can be rational numbers. For a general DAG model G = (V, E), an imset in
standard form (Studeny, 2010) is defined as
X
uG = δV − δ∅ + (δPai − δi∪Pai )
i∈V

Figure 5 shows an example imset for DAG models over three variables, defined by an integer valued function over the
lattice of subsets. Each of the three DAG models shown defines exactly the same imset function. Studeny (2010) gives
a detailed analysis of imsets as a non-graphical representation of conditional independence.

Figure 5: An illustration of an integer-valued multiset (imset) consisting of a lattice of subsets over three elements for
representing conditional independences in DAG models. All three DAG models are represented by the same imset. We
can view an imset as a functor from a simplicial object (shown as the dark blue filled triangle) to a left k module. Note
that a causal intervention on a DAG generates a horn of the simplicial object shown in dark blue.
1
{a,b,c} a b c

-1 0 -1
{a,b} {a,c} {b,c}
c b a

0 1 0
a b c

b
0 b(1)
{}
-1 {},0 -1
a c
1
a(0) c(0)
0

Figure 6 illustrates the space of possible causal DAG models over 3 variables, with their associated imset representations
(see (Studený et al., 2010b) for details). Each vertex in this figure shows a potential DAG model. To explain this
representation in more detail, the undirected edges represent the essential graph, namely for each edge of the form a − b,
there are two possible structures that it represents a → b and a ← b. These two structures are homotopically equivalent,
that is, they define an equivalence class on the space of all models.

5.3 Causal Discovery as Adjoint Functors

Following the case for PSR structure discovery, we can also formulate causal discovery in terms of a pair of adjoint
functors between a category C and a simplicial object X.

G
Xn > C.
F

In general, the functor G from a simplicial object X to a category C can be lossy. For example, we can define the
objects of C to be the elements of X0 , and the morphisms of C as the elements of f ∈ X1 , where f : a → b, and

15
A PREPRINT - S EPTEMBER 15, 2022

Figure 6: Causal discovery through the space of all models over 3 variables shown with their associated imset
representations (Studený et al., 2010b). Each candidate DAG defines a causal horn, a simplicial subobject of the
complete simplex on ∆2 , and the process of causal structure discovery can be viewed in terms of the abstract horn
filling problem defined above for higher-order categories.
B
2 +
<latexit sha1_base64="61xWfijT4ufNl1JJRIYfqHiKmfo=">AAACK3icbZDJSgNBEIZ74hbjFvXopTEIghhmgqhH0YvHCCYRkhBqOhVt0rPQXSOEIe/jxVfxoAcXvPoedpIJrj80fPxVRXX9fqykIdd9dXIzs3PzC/nFwtLyyupacX2jbqJEC6yJSEX6ygeDSoZYI0kKr2KNEPgKG37/bFRv3KI2MgovaRBjO4DrUPakALJWp3ha4a0uKoJOC4OYBgaJ708t+EL/CwXfm2IKvhh2iiW37I7F/4KXQYllqnaKj61uJJIAQxIKjGl6bkztFDRJoXBYaCUGYxB9uMamxRACNO10fOuQ71iny3uRti8kPna/T6QQGDMIfNsZAN2Y37WR+V+tmVDvuJ3KME4IQzFZ1EsUp4iPguNdqVGQGlgAoaX9Kxc3oEGQjbdgQ/B+n/wX6pWyd1j2Lg5KJ6dZHHm2xbbZLvPYETth56zKakywO/bAntmLc+88OW/O+6Q152Qzm+yHnI9P1j2mPQ==</latexit>

; a b c abc
A C

+
<latexit sha1_base64="j42zJex4kUMsrMKZZYVe3HVQkNY=">AAACInicbVDJSgNBEO1xjXGLevTSGARBDDMiLregF48RzAKZIfR0KkmTnoXuGiEM+RYv/ooXD4p6EvwYO8lANPFBw6v3qqiu58dSaLTtL2thcWl5ZTW3ll/f2NzaLuzs1nSUKA5VHslINXymQYoQqihQQiNWwAJfQt3v34z8+gMoLaLwHgcxeAHrhqIjOEMjtQpXbhskspYLQYwDDUhPaCb5U5oyPqTH08o3ZatQtEv2GHSeOBkpkgyVVuHDbUc8CSBELpnWTceO0UuZQsElDPNuoiFmvM+60DQ0ZAFoLx2fOKSHRmnTTqTMC5GO1d8TKQu0HgS+6QwY9vSsNxL/85oJdi69VIRxghDyyaJOIilGdJQXbQsFHOXAEMaVMH+lvMcU42hSzZsQnNmT50nttOScl5y7s2L5OosjR/bJATkiDrkgZXJLKqRKOHkkz+SVvFlP1ov1bn1OWhesbGaP/IH1/QM70qN3</latexit>

; b ac abc
B B B

A C A C A C

B B B B B
B

A C A C A C A C A C
A C

+ +
<latexit sha1_base64="7F4SG+PId6BPslPmc5BWGqLjPnE=">AAACHnicbZDLSgNBEEV7fMb4irp00xgCghhmxNcy6MZlBKOBJISaTkUbex501whhyJe48VfcuFBEcKV/YycZURMvNBxudVFV14+VNOS6n87U9Mzs3HxuIb+4tLyyWlhbvzRRogXWRKQiXffBoJIh1kiSwnqsEQJf4ZV/ezqoX92hNjIKL6gXYyuA61B2pQCyVrtw0OygImg3MYipZ5D4Ls8s+EGf73xjCn6ftwtFt+wOxSfBy6DIMlXbhfdmJxJJgCEJBcY0PDemVgqapFDYzzcTgzGIW7jGhsUQAjStdHhen5es0+HdSNsXEh+6vztSCIzpBXavUgB0Y8ZrA/O/WiOh7nErlWGcEIZiNKibKE4RH2TFO1KjINWzAEJLuysXN6BBkE00b0Pwxk+ehMu9sndY9s73i5WTLI4c22RbbJt57IhV2BmrshoT7J49smf24jw4T86r8zb6OuVkPRvsj5yPL654oZE=</latexit> <latexit sha1_base64="G+HLiNM3zflWcVksmbN/bade8do=">AAACHXicbVDLSsNAFJ34rPUVdelmsBQEsSRS1GXRjcsK9gFtCDeTSTt08mBmIpSQH3Hjr7hxoYgLN+LfOG2zqK0HBs49Zy733uMlnEllWT/Gyura+sZmaau8vbO7t28eHLZlnApCWyTmseh6IClnEW0ppjjtJoJC6HHa8Ua3E7/zSIVkcfSgxgl1QhhELGAElJZcs973KVfgAj7HBc3Ay+crkuOzOU+XrlmxatYUeJnYBamgAk3X/Or7MUlDGinCQcqebSXKyUAoRjjNy/1U0gTICAa0p2kEIZVONr0ux1Wt+DiIhX6RwlN1viODUMpxqJeuhqCGctGbiP95vVQF107GoiRVNCKzQUHKsYrxJCrsM0GJ4mNNgAimd8VkCAKI0oGWdQj24snLpH1Rsy9r9n290rgp4iihY3SCTpGNrlAD3aEmaiGCntALekPvxrPxanwYn7OvK0bRc4T+wPj+BdeIoSg=</latexit>

; a b ab a ab ac abc

0
<latexit sha1_base64="Nh7TLHkIWAHD9QsOwjc+3htkUJM=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE1GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUcPvlilt15yCrxMtJBXLU++Wv3iBmaYTSMEG17npuYvyMKsOZwGmpl2pMKBvTIXYtlTRC7WfzQ6fkzCoDEsbKljRkrv6eyGik9SQKbGdEzUgvezPxP6+bmvDGz7hMUoOSLRaFqSAmJrOvyYArZEZMLKFMcXsrYSOqKDM2m5INwVt+eZW0LqreVdVrXFZqt3kcRTiBUzgHD66hBvdQhyYwQHiGV3hzHp0X5935WLQWnHzmGP7A+fwBexGMug==</latexit>

A C

d0 f = a, and d1 f = b, and s0 a, a ∈ X as defining the identity morphisms 1a . Composition in this case can be defined
as the free algebra defined over elements of X1 , subject to the constraints given by elements of X2 . For example, if
x ∈ X2 , we can impose the requirement that d1 x = d0 x ◦ d2 x. Such a definition of the left adjoint would be quite
lossy because it only preserves the structure of the simplicial object X up to the 2-simplices. The right adjoint from a
category to its associated simplicial object, in contrast, constructs a full and faithful embedding of a category into a
simplicial set. In particular, the nerve of a category is such a right adjoint.
Consider the three horns defined below, once again, and let us relate them to the partial DAG structures shown in
Figure 6. Notice all three horns are illustrated by candidate DAG structures, for example, the rightmost horn below
corresponds to the partial DAG A → C ← B, under the mapping 0 → A, 1 → B, 2 → C. The edge that represents
moving from this partial DAG to the complete DAG at the bottom of Figure 6 corresponds exactly to the problem of
filling the outer horn shown below on the right.

{0} {0} {0}

{1} {2} {1} {2} {1} {2}

Let us consider a more complex case of horn filling over a 3-simplex f : Λ32 → C for any category C that represents a
causal model. The “data" that is used to solve the “horn filling" problem is as follows:

• There are 4 objects, represented as 0-simplices x0 , x1 , x2 and x3 .


• There are 6 1-simplices, which we can define as potential causal arrows f0,1 , f1,2 , f2,3 , f0,2 , f0,3 and f1,3 .
Each of these arrows fi,j defines two 0-faces xi and xj , the basic units in a causal study.
• There are only 3 2-simplices f0,2,3 , f0,1,2 and f1,2,3 . As we are dealing with the horn Λ32 , the 2-simplex
f0,1,3 is missing. Each of the given 2-simplices defines a homotopy equivalent composition, for example
f0,2,3 : f2,3 ◦ f0,2 ∼ f0,3 .

16
A PREPRINT - S EPTEMBER 15, 2022

The problem of horn filling is to deduce the missing 2-simplex f0,1,3 and the missing 3-simplex f0,1,2,3 by defining a
homotopy of compositions of homotopies:

f0,1,2,3 : f0,2,3 ◦ f0,1,2 ≡ f0,1,3 ◦ f1,2,3


Definition 30. The nerve of a causal model C is the set of sequences of composable arrows of length n, for n > 1. Let
Nn (C) denote the set of sequences of composable arrows

f1 f2 fn
{Co −→ C1 −→ . . . −→ Cn | Ci is an object in C, fi is a morphism in C}

We can define the causal discovery problem in terms of the right adjoint from the nerve of a causal model to the
category. In general, the functor mapping a simplicial object to a category is lossy, because it only preserves properties
of the simplicial object up to a certain level. In the case of causal discovery, we can exploit symmetry properties from
conditional independence to design this functorial mapping, as was discussed above.

G
N• (C) > C.
F

The definition of nerve leads to an important topological characterization of any causal category.
Definition 31. The classifying space of a causal model C is the topological space |N• (C)|.

The classifying space of a causal model gives a way to define an algebraic invariant.

5.4 The Singular Homology of a Causal Model

We will now describe the singular homology of a causal model, using the imset representation as an example shown
earlier in Figure 6.
Definition 32. For any topological space X defined by a causal model, the singular homology groups H∗ (X; Z) are
defined as the homology groups of a chain complex

∂ ∂ ∂
... −
→ Z(Sing2 (X)) −
→ Z(Sing1 (X)) −
→ Z(Sing0 (X))

where Z(Singn (X)) denotes the free Abelian group generated by the set Singn (X) and the differential ∂ is defined on
the generators by the formula

n
X
∂(σ) = (−1)i di σ
i=0

Intuitively, a chain complex builds a sequence of vector spaces that can be used to construct an algebraic invariant of
a topological space by choosing the left k module Z to be a vector space. Each differential ∂ then becomes a linear
transformation whose representation is constructed by modeling its effect on the basis elements in each Z(Singn (X)).
Example 10. Let us illustrate the singular homology groups defined by an integer-valued multiset Studeny (2010)
used to model conditional independence. Imsets over a DAG of three variables N = {a, b, c} shown previously as
Figure 5 can be viewed as a finite discrete topological space. For this topological space X, the singular homology
groups H∗ (X; Z) are defined as the homology groups of a chain complex

∂ ∂ ∂ ∂
Z(Sing3 (X)) −
→ Z(Sing2 (X)) −
→ Z(Sing2 (X)) −
→ Z(Sing1 (X)) −
→ Z(Sing0 (X))

where Z(Singi (X)) denotes the free Abelian group generated by the set Singi (X) and the differential ∂ is defined on
the generators by the formula

4
X
∂(σ) = (−1)i di σ
i=0

17
A PREPRINT - S EPTEMBER 15, 2022

The set Singn (X) is the set of all morphisms HomT op (|∆n |, X). For an imset over the three variables N = {a, b, c},
we can define the singular n-simplex σ as:

σ : |∆4 | → X where |∆n | = {t0 , t1 , t2 , t3 ∈ [0, 1]4 : t0 + t1 + t2 + t3 = 1}

The n-simplex σ has a collection of faces denoted as d0 σ, d1 σ, d2 σ and d3 σ. If we pick the k-left module Z as the
vector space over real numbers R, then the above chain complex represents a sequence of vector spaces that can be used
to construct an algebraic invariant of a topological space defined by the integer-valued multiset. Each differential ∂ then
becomes a linear transformation whose representation is constructed by modeling its effect on the basis elements in
each Z(Singn (X)). An alternate approach to constructing a chain homology for an integer-valued multiset is to use
Möbius inversion to define the chain complex in terms of the nerve of a category (see our recent work on categoroids
(Mahadevan, 2022b) for details).

6 Summary
In this paper we presented a unified formalism for structure discovery in causal inference and reinforcement learning
using the framework of simplicial objects, contravariant functors from the category of ordinal numbers into any
category, which form the basis for higher-order category theory. We showed that structure discovery in both causal
inference and RL can be defined as filling the horns of simplicial objects, and showed that a number of constructions in
higher-order category theory, such as extensions and lifting problems, weak Kan complexes, nerve of a category, and
chain complexes could be brought to bear on the design of new algorithms. Specifically, we show that fragments of
causal models that are equivalent under conditional independence – defined as causal horns – as well as fragments of
potential tests in a predictive state representation – defined as predictive horns – are both special cases of horns of a
simplicial object of order n, resulting from the removal of the interior and the face opposite a particular vertex in an
n-simplex. Causal and predictive-state horns both result from interventions on simplicial objects, defined as images
of a sequence of elementary order-preserving morphisms from the category of ordinal numbers ∆ into an underlying
category encoding a universal causal or decision model. Latent structure discovery in both settings involve the same
fundamental mathematical problem of finding extensions of horns of simplicial objects through solving lifting problems
in commutative diagrams, and exploiting weak homotopies that define higher-order symmetries. The problem of filling
“inner horns" is solved using quasicategories and weak Kan extensions, whereas filling “outer" horns requires ideas
from ∞-category theory. We define the common abstract problem of structure discovery in terms of adjoint functors
between a universal causal or decision model category and its simplicial object representation. In general, the left
adjoint functor from a simplicial object X to a category C is lossy, preserving only relationships up to a certain order
defined by homotopical equivalences. In contrast, the right adjoint defining the nerve of a category constructs a lossless
encoding of a category as a simplicial object.

References
J.P. May. Simplicial Objects in Algebraic Topology. Chicago Lectures in Mathematics. University of Chicago Press,
1992. ISBN 9780226511818. URL https://books.google.com/books?id=QGjwV0gyQnIC.
M. Boardman and Rainer Vogt. Homotopy invariant algebraic structures on topological spaces. Springer, Berlin, 1973.
A. Joyal. Quasi-categories and kan complexes. Journal of Pure and Applied Algebra, 175(1):207–222, 2002. ISSN 0022-
4049. doi:https://doi.org/10.1016/S0022-4049(02)00135-4. URL https://www.sciencedirect.com/science/
article/pii/S0022404902001354. Special Volume celebrating the 70th birthday of Professor Max Kelly.
Jacob Lurie. Kerodon. https://kerodon.net, 2022.
Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009a.
ISBN 052189560X.
Guido W. Imbens and Donald B. Rubin. Causal Inference for Statistics, Social, and Biomedical Sciences: An
Introduction. Cambridge University Press, USA, 2015. ISBN 0521885884.
Peter Spirtes, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search, Second Edition. Adaptive
computation and machine learning. MIT Press, 2000. ISBN 978-0-262-19440-2.
Satinder P. Singh, Michael R. James, and Matthew R. Rudary. Predictive state representations: A new theory for
modeling dynamical systems. In David Maxwell Chickering and Joseph Y. Halpern, editors, UAI ’04, Proceedings of
the 20th Conference in Uncertainty in Artificial Intelligence, Banff, Canada, July 7-11, 2004, pages 512–518. AUAI
Press, 2004.

18
A PREPRINT - S EPTEMBER 15, 2022

Richard S. Sutton and Andrew G. Barto. Reinforcement learning - an introduction. Adaptive computation and machine
learning. MIT Press, 1998. ISBN 978-0-262-19398-6. URL https://www.worldcat.org/oclc/37293240.
Peter Van Overschee and Bart De Moor. Subspace identification for linear systems. Theory, implementation, applications.
Incl. 1 disk, volume xiv, pages xiv + 254. Springer, 01 1996. ISBN 0-7923-9717-7. doi:10.1007/978-1-4613-0465-4.
N. Chomsky and M.P. Schützenberger. The algebraic theory of context-free languages*. In P. Braffort and D. Hirschberg,
editors, Computer Programming and Formal Systems, volume 35 of Studies in Logic and the Foundations of
Mathematics, pages 118–161. Elsevier, 1963. doi:https://doi.org/10.1016/S0049-237X(08)72023-8. URL https:
//www.sciencedirect.com/science/article/pii/S0049237X08720238.
Y. Give’on and M.A. Arbib. Algebra automata ii: The categorical framework for dynamic analysis. Information
and Control, 12(4):346–370, 1968. ISSN 0019-9958. doi:https://doi.org/10.1016/S0019-9958(68)90381-1. URL
https://www.sciencedirect.com/science/article/pii/S0019995868903811.
Judea Pearl. Probabilistic reasoning in intelligent systems - networks of plausible inference. Morgan Kaufmann series
in representation and reasoning. Morgan Kaufmann, 1989.
Steffen L. Lauritzen and Thomas S. Richardson. Chain graph models and their causal interpretations. Journal of the
Royal Statistical Society: Series B (Statistical Methodology), 64(3):321–348, 2002. doi:https://doi.org/10.1111/1467-
9868.00340. URL https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1467-9868.00340.
Patrick Forré and Joris M. Mooij. Markov properties for graphical models with cycles and latent variables, 2017.
Robin J. Evans. Margins of discrete Bayesian networks. The Annals of Statistics, 46(6A):2623 – 2656, 2018.
doi:10.1214/17-AOS1631. URL https://doi.org/10.1214/17-AOS1631.
A. Philip Dawid. Separoids: A mathematical framework for conditional independence and irrelevance. Ann. Math.
Artif. Intell., 32(1-4):335–372, 2001. doi:10.1023/A:1016734104787. URL https://doi.org/10.1023/A:
1016734104787.
Milan Studený, Jiří Vomlel, and Raymond Hemmecke. A geometric view on learning bayesian network
structures. International Journal of Approximate Reasoning, 51(5):573–586, 2010a. ISSN 0888-613X.
doi:https://doi.org/10.1016/j.ijar.2010.01.014. URL https://www.sciencedirect.com/science/article/
pii/S0888613X10000216. PGM-2008.
Sridhar Mahadevan. On the universality of diagrams for causal inference and the causal reproducing property, 2022a.
URL https://arxiv.org/abs/2207.02917.
Sridhar Mahadevan. Universal decision models. CoRR, abs/2110.15431, 2021a. URL https://arxiv.org/abs/
2110.15431.
J.P. May. A Concise Course in Algebraic Topology. Chicago Lectures in Mathematics. University of Chicago Press,
1999. ISBN 9780226511832. URL https://books.google.com/books?id=g8SG03R1bpgC.
Saunders MacLane. Categories for the Working Mathematician. Springer-Verlag, New York, 1971. Graduate Texts in
Mathematics, Vol. 5.
B. Richter. From Categories to Homotopy Theory. Cambridge Studies in Advanced Mathematics. Cambridge University
Press, 2020. ISBN 9781108479622. URL https://books.google.com/books?id=pnzUDwAAQBAJ.
Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in
Probability and Statistics. Wiley, 1994. ISBN 978-0-47161977-2. doi:10.1002/9780470316887. URL https:
//doi.org/10.1002/9780470316887.
Balaraman Ravindran and Andrew G. Barto. SMDP homomorphisms: An algebraic approach to abstraction in semi-
markov decision processes. In Georg Gottlob and Toby Walsh, editors, IJCAI-03, Proceedings of the Eighteenth
International Joint Conference on Artificial Intelligence, Acapulco, Mexico, August 9-15, 2003, pages 1011–1018.
Morgan Kaufmann, 2003. URL http://ijcai.org/Proceedings/03/Papers/145.pdf.
Brendan Fong and David I Spivak. Seven sketches in compositionality: An invitation to applied category theory, 2018.
URL http://arxiv.org/abs/1803.05316. cite arxiv:1803.05316Comment: 341+xii pages.
Tai-Danae Bradley, John Terilla, and Yiannis Vlassopoulos. An enriched category theory of language: From syntax
to semantics. La Matematica, mar 2022. doi:10.1007/s44007-022-00021-2. URL https://doi.org/10.1007%
2Fs44007-022-00021-2.
Michael R. Thon and Herbert Jaeger. Links between multiplicity automata, observable operator models and predictive
state representations: a unified learning framework. J. Mach. Learn. Res., 16:103–147, 2015. URL http://dl.acm.
org/citation.cfm?id=2789276.

19
A PREPRINT - S EPTEMBER 15, 2022

Vishal Soni and Satinder P. Singh. Abstraction in predictive state representations. In Proceedings of the Twenty-Second
AAAI Conference on Artificial Intelligence, July 22-26, 2007, Vancouver, British Columbia, Canada, pages 639–644.
AAAI Press, 2007. URL http://www.aaai.org/Library/AAAI/2007/aaai07-101.php.
Dominik Janzing, David Balduzzi, Moritz Grosse-Wentrup, and Bernhard Schölkopf. Quantifying causal influences.
The Annals of Statistics, 41(5):2324 – 2358, 2013. doi:10.1214/13-AOS1145. URL https://doi.org/10.1214/
13-AOS1145.
Corwin M. Zigler and Georgia Papadogeorgou. Bipartite causal inference with interference, 2018.
Elie M. Adam and Munther A. Dahleh. Generativity and interactional effects: an overview, 2019. URL https:
//arxiv.org/abs/1911.10406.
A. Nagurney. Network Economics: A Variational Inequality Approach. Kluwer Academic Press, 1999.
Sridhar Mahadevan. Causal inference in network economics. CoRR, abs/2109.11344, 2021b. URL https://arxiv.
org/abs/2109.11344.
Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009b.
ISBN 052189560X.
H. S. Witsenhausen. The intrinsic model for discrete stochastic control: Some open problems. In A. Bensoussan and
J. L. Lions, editors, Control Theory, Numerical Methods and Computer Systems Modelling, pages 322–335, Berlin,
Heidelberg, 1975. Springer Berlin Heidelberg. ISBN 978-3-642-46317-4.
Benjamin Heymann, Michel de Lara, and Jean-Philippe Chancelier. Causal inference theory with information depen-
dency models, 2021. URL https://arxiv.org/abs/2108.03099.
Sridhar Mahadevan. Causal homotopy, 2021c. URL https://arxiv.org/abs/2112.01847.
Alain Hauser and Peter Bühlmann. Characterization and greedy learning of interventional markov equivalence classes
of directed acyclic graphs. J. Mach. Learn. Res., 13:2409–2464, 2012. URL http://dl.acm.org/citation.
cfm?id=2503320.
Kevin Bello and Jean Honorio. Computationally and statistically efficient learning of causal bayes nets using path queries.
In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett,
editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Process-
ing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 10954–10964, 2018. URL https:
//proceedings.neurips.cc/paper/2018/hash/a0b45d1bb84fe1bedbb8449764c4d5d5-Abstract.html.
Frederick Eberhardt. Almost optimal intervention sets for causal discovery. In David A. McAllester and Petri Myllymäki,
editors, UAI 2008, Proceedings of the 24th Conference in Uncertainty in Artificial Intelligence, Helsinki, Finland, July
9-12, 2008, pages 161–168. AUAI Press, 2008. URL https://dslpitt.org/uai/displayArticleDetails.
jsp?mmnu=1&smnu=2&article_id=1948&proceeding_id=24.
Murat Kocaoglu, Karthikeyan Shanmugam, and Elias Bareinboim. Experimental design for learning causal graphs
with latent variables. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fer-
gus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Sys-
tems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long
Beach, CA, USA, pages 7018–7028, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/
291d43c696d8c3704cdbe0a72ade5f6c-Abstract.html.
Gyula Katona. On separating systems of a finite set. Journal of Combinatorial Theory, 2(1):174–194, 1966.
CAI Mao-cheng. On separating systems of graphs. Discrete Mathematics, 49(1):15–20, 1984. ISSN 0012-
365X. doi:https://doi.org/10.1016/0012-365X(84)90146-8. URL https://www.sciencedirect.com/science/
article/pii/0012365X84901468.
M. Studeny. Probabilistic Conditional Independence Structures. Information Science and Statistics. Springer London,
2010. ISBN 9781849969482. URL https://books.google.com.gi/books?id=bGFRcgAACAAJ.
Milan Studený, Jiří Vomlel, and Raymond Hemmecke. A geometric view on learning bayesian network structures.
International Journal of Approximate Reasoning, 51:573–586, 06 2010b. doi:10.1016/j.ijar.2010.01.014.
Sridhar Mahadevan. Categoroids: Universal conditional independence, 2022b. URL https://arxiv.org/abs/2208.
11077.

20

You might also like