The Classification of Reversible Bit Operations
The Classification of Reversible Bit Operations
Abstract
We present a complete classification of all possible sets of classical reversible gates acting
on bits, in terms of which reversible transformations they generate, assuming swaps and ancilla
bits are available for free. Our classification can be seen as the reversible-computing analogue
of Post’s lattice, a central result in mathematical logic from the 1940s. It is a step toward the
ambitious goal of classifying all possible quantum gate sets acting on qubits.
Our theorem implies a linear-time algorithm (which we have implemented), that takes as
input the truth tables of reversible gates G and H, and that decides whether G generates H.
Previously, this problem was not even known to be decidable (though with effort, one can derive
from abstract considerations an algorithm that takes triply-exponential time). The theorem
also implies that any n-bit reversible circuit can be “compressed” to an equivalent circuit, over
the same gates, that uses at most 2n poly (n) gates and O(1) ancilla bits; these are the first
upper bounds on these quantities known, and are close to optimal. Finally, the theorem implies
that every non-degenerate reversible gate can implement either every reversible transformation,
or every affine transformation, when restricted to an “encoded subspace.”
Briefly, the theorem says that every set of reversible gates generates either all reversible trans-
formations on n-bit strings (as the Toffoli gate does); no transformations; all transformations
that preserve Hamming weight (as the Fredkin gate does); all transformations that preserve
Hamming weight mod k for some k; all affine transformations (as the Controlled-NOT gate
does); all affine transformations that preserve Hamming weight mod 2 or mod 4, inner products
mod 2, or a combination thereof; or a previous class augmented by a NOT or NOTNOT gate.
Prior to this work, it was not even known that every class was finitely generated. Ruling out
the possibility of additional classes, not in the list, requires some arguments about polynomials,
lattices, and Diophantine equations.
Contents
1 Introduction 3
1.1 Classical Reversible Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Ground Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Our Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Algorithmic and Complexity Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Proof Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
∗
MIT. Email: [email protected]. Supported by an Alan T. Waterman Award from the National Science
Foundation, under grant no. 1249349.
†
MIT. Email: [email protected]. Supported by an NSF Graduate Research Fellowship under Grant No. 1122374.
‡
MIT. Email: [email protected].
1
2 Notation and Definitions 11
2.1 Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Gate Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Alternative Kinds of Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
10 Open Problems 53
11 Acknowledgments 54
2
1 Introduction
The pervasiveness of universality—that is, the likelihood that a small number of simple operations
already generate all operations in some relevant class—is one of the central phenomena in com-
puter science. It appears, among other places, in the ability of simple logic gates to generate all
Boolean functions (and of simple quantum gates to generate all unitary transformations); and in
the simplicity of the rule sets that lead to Turing-universality, or to formal systems to which Gödel’s
theorems apply. Yet precisely because universality is so pervasive, it is often more interesting to
understand the ways in which systems can fail to be universal.
In 1941, the great logician Emil Post [22] published a complete classification of all the ways in
which sets of Boolean logic gates can fail to be universal: for example, by being monotone (like the
AND and OR gates) or by being affine over F2 (like NOT and XOR). In universal algebra, closed
classes of functions are known, somewhat opaquely, as clones, while the inclusion diagram of all
Boolean clones is called Post’s lattice. Post’s lattice is surprisingly complicated, in part because
Post did not assume that the constant functions 0 and 1 were available for free.1
This paper had its origin in our ambition to find the analogue of Post’s lattice for all possible sets
of quantum gates acting on qubits. We view this as a large, important, and underappreciated goal:
something that could be to quantum computing theory almost what the Classification of Finite
Simple Groups was to group theory. To provide some context, there are many finite sets of 1-, 2-
and 3-qubit quantum gates that are known to be universal—either in the strong sense that they
can be used to approximate any n-qubit unitary transformation to any desired precision, or in the
weaker sense that they suffice to perform universal quantum computation (possibly in an encoded
subspace). To take two examples, Barenco et al. [5] showed universality for the CNOT gate plus
the set of all 1-qubit gates, while Shi [26] showed universality for the Toffoli and Hadamard gates.
There are also sets of quantum gates that are known not to be universal: for example, the basis-
preserving gates, the 1-qubit gates, and most interestingly, the so-called stabilizer gates [11, 3] (that
is, the CNOT, Hadamard, and π/4-Phase gates), as well as the stabilizer gates conjugated by 1-
qubit unitary transformations. What is not known is whether the preceding list basically exhausts
the ways in which quantum gates on qubits can fail to be universal. Are there other elegant
discrete structures, analogous to the stabilizer gates, waiting to be discovered? Are there any gate
sets, other than conjugated stabilizer gates, that might give rise to intermediate complexity classes,
neither contained in P nor equal to BQP?2 How can we claim to understand quantum circuits—the
bread-and-butter of quantum computing textbooks and introductory quantum computing courses—
if we do not know the answers to such questions?
Unfortunately, working out the full “quantum Post’s lattice” appears out of reach at present.
This might surprise readers, given how much is known about particular quantum gate sets (e.g.,
those containing CNOT gates), but keep in mind that what is asked for is an accounting of all pos-
sibilities, no matter how exotic. Indeed, even classifying 1- and 2-qubit quantum gate sets remains
wide open (!), and seems, without a new idea, to require studying the irreducible representations
1
In Appendix 12, we prove for completeness that if one does assume constants are free, then Post’s lattice dra-
matically simplifies, with all non-universal gate sets either monotone or affine.
2
To clarify, there are many restricted models of quantum computing known that are plausibly “intermediate” in
that sense, including BosonSampling [1], the one-clean-qubit model [15], and log-depth quantum circuits [8]. However,
with the exception of conjugated stabilizer gates, none of those models arises from simply considering which unitary
transformations can be generated by some set of k-qubit gates. They all involve non-standard initial states, building
blocks other than qubits, or restrictions on how the gates can be composed.
3
of thousands of groups. Recently, Aaronson and Bouland [2] completed a much simpler task, the
classification of 2-mode beamsplitters; that was already a complicated undertaking.
• the 2-bit CNOT (Controlled-NOT) gate, which flips the second bit if and only if the first bit
is 1;
• the 3-bit Toffoli gate, which flips the third bit if and only if the first two bits are both 1;
• the 3-bit Fredkin gate, which swaps the second and third bits if and only if the first bit is 1.
These three gates already illustrate some of the concepts that play important roles in this paper.
The CNOT gate can be used to copy information in a reversible way, since it maps x0 to xx; and also
to compute arbitrary affine functions over the finite field F2 . However, because CNOT is limited
to affine transformations, it is not computationally universal. Indeed, in contrast to the situation
with irreversible logic gates, one can show that no 2-bit classical reversible gate is computationally
universal. The Toffoli gate is computationally universal, because (for example) it maps x, y, 1 to
x, y, xy, thereby computing the NAND function. Moreover, Toffoli showed [28]—and we prove for
completeness in Section 7.1—that the Toffoli gate is universal in a stronger sense: it generates all
possible reversible transformations F : {0, 1}n → {0, 1}n if one allows the use of ancilla bits, which
must be returned to their initial states by the end.
But perhaps the most interesting case is that of the Fredkin gate. Like the Toffoli gate,
the Fredkin gate is computationally universal: for example, it maps x, y, 0 to x, xy, xy, thereby
computing the AND function. But the Fredkin gate is not universal in the stronger sense. The
reason is that it is conservative: that is, it never changes the total Hamming weight of the input. Far
from being just a technical issue, conservativity was regarded by Fredkin and the other reversible
computing pioneers as a sort of discrete analogue of the conservation of energy—and indeed, it
plays a central role in certain physical realizations of reversible computing (for example, billiard-
ball models, in which the total number of billiard balls must be conserved).
4
However, all we have seen so far are three specific examples of reversible gates, each leading
to a different behavior. To anyone with a mathematical mindset, the question remains: what
are all the possible behaviors? For example: is Hamming weight the only possible “conserved
quantity” in reversible computation? Are there other ways, besides being affine, to fail to be
computationally universal? Can one derive, from first principles, why the classes of reversible
transformations generated by CNOT, Fredkin, etc. are somehow special, rather than just pointing
to the sociological fact that these are classes that people in the early 1980s happened to study?
5
1.3 Our Results
Even after we assume that bit swaps and ancilla bits are free, it remains a significant undertaking
to work out the complete list of reversible gate classes, and (especially!) to prove that the list is
complete. Doing so is this paper’s main technical contribution.
We give a formal statement of the classification theorem in Section 3, and we show the lattice
of reversible gate classes in Figure 3. (In Appendix 14, we also calculate the exact number of 3-bit
gates that generate each class.) For now, let us simply state the main conclusions informally.
(1) Conserved Quantities. The following is the complete list of the “global quantities” that
reversible gate sets can conserve (if we restrict attention to non-degenerate gate sets, and
ignore certain complications caused by linearity and affineness): Hamming weight, Hamming
weight mod k for any k ≥ 2, and inner product mod 2 between pairs of inputs.
(2) Anti-Conservation. There are gates, such as the NOT gate, that “anti-conserve” the
Hamming weight mod 2 (i.e., always change it by a fixed nonzero amount). However, there
are no analogues of these for any of the other conserved quantities.
(3) Encoded Universality. In terms of their “computational power,” there are only three
kinds of reversible gate sets: degenerate (e.g., NOTs, bit-swaps), non-degenerate but affine
(e.g., CNOT), and non-affine (e.g., Toffoli, Fredkin). More interestingly, every non-affine
gate set can implement every reversible transformation, and every non-degenerate affine gate
set can implement every affine transformation, if the input and output bits are encoded by
longer strings in a suitable way. For details about “encoded universality,” see Section 4.4.
(4) Sporadic Gate Sets. The conserved quantities interact with linearity and affineness in
complicated ways, producing “sporadic” affine gate sets that we have classified. For example,
non-degenerate affine gates can preserve Hamming weight mod k, but only if k = 2 or k = 4.
All gates that preserve inner product mod 2 are linear, and all linear gates that preserve
Hamming weight mod 4 also preserve inner product mod 2. As a further complication, affine
gates can be orthogonal or mod-2-preserving or mod-4-preserving in their linear part, but not
in their affine part.
(5) Finite Generation. For each closed class of reversible transformations, there is a single
gate that generates the entire class. (A priori, it is not even obvious that every class is finitely
generated, or that there is “only” a countable infinity of classes!) For more, see Section 4.1.
(6) Symmetry. Every reversible gate set is symmetric under interchanging the roles of 0 and
1. For more, see Section 4.1.
6
Gi ’s generate H. Then we obtain a linear-time algorithm for RevGen. Here, of course, “linear”
means linear in the sizes of the truth tables, which is n2n for an n-bit gate. However, if just a
tiny amount of “summary data” about each gate G is provided—namely, the possible values of
|G (x)| − |x|, where |·| is the Hamming weight, as well as which affine transformation G performs if
it is affine—then the algorithm actually runs in O (nω ) time, where ω is the matrix multiplication
exponent.
We have implemented this algorithm; code is available for download at [24]. For more details
see Section 4.2.
Our classification theorem also implies the first general upper bounds (i.e., bounds that hold
for all possible gate sets) on the number of gates and ancilla bits needed to implement reversible
transformations. In particular, we show (see Section 4.3) that if a set of reversible gates generates
an n-bit transformation F at all, then it does so via a circuit with at most 2n poly (n) gates and
O(1) ancilla bits. These bounds are close to optimal.
By contrast, let us consider the situation for these problems without the classification theorem.
Suppose, for example, that we want to know whether a reversible transformation H : {0, 1}n →
{0, 1}n can be synthesized using gates G1 , . . . , GK . If we knew some upper bound on the number
of ancilla bits that might be needed by the generating circuit, then if nothing else, we could of
course solve this problem by brute force. The trouble is that, without the classification, it is not
obvious how to prove any upper bound on the number of ancillas—not even, say, Ackermann (n).
This makes it unclear, a priori, whether RevGen is even decidable, never mind its complexity!
One can show on abstract grounds that RevGen is decidable, but with an astronomical running
time. To explain this requires a short digression. In universal algebra, there is a body of theory
(see e.g. [18]), which grew out of Post’s original work [22], about the general problem of classifying
closed classes of functions (clones) of various kinds. The upshot is that every clone is characterized
by an invariant that all functions in the clone preserve: for example, affineness for the NOT and
XOR functions, or monotonicity for the AND and OR functions. The clone can then be shown
to contain all functions that preserve the invariant. (There is a formal definition of “invariant,”
involving polymorphisms, which makes this statement not a tautology, but we omit it.) Alongside
the lattice of clones of functions, there is a dual lattice of coclones of invariants, and there is a
Galois connection relating the two: as one adds more functions, one preserves fewer invariants, and
vice versa.
In response to an inquiry by us, Emil Jeřábek recently showed [12] that the clone/coclone
duality can be adapted to the setting of reversible gates. This means that we know, even without
a classification theorem, that every closed class of reversible transformations is uniquely determined
by the invariants that it preserves.
Unfortunately, this elegant characterization does not give rise to feasible algorithms. The
reason is that, for an n-bit gate G : {0, 1}n → {0, 1}n , the invariants could in principle involve
all 2n inputs, as well arbitrary polymorphisms mapping those inputs into a commutative monoid.
2n
Thus the number of polymorphisms one needs to consider grows at least like 22 . Now, the word
problem for commutative monoids is decidable, by reduction to the ideal membership problem (see,
e.g., [14, p. 55]). And by putting these facts together, one can derive an algorithm for RevGen
that uses doubly-exponential space and triply-exponential time, as a function of the truth table
sizes: in other words, exp (exp (exp (exp (n)))) time, as a function of n. We believe it should also
be possible to extract exp (exp (exp (exp (n)))) upper bounds on the number of gates and ancillas
from this algorithm, although we have not verified the details.
7
1.5 Proof Ideas
We hope we have made the case that the classification theorem improves the complexity situation for
reversible circuit synthesis! Even so, some people might regard classifying all possible reversible
gate sets as a complicated, maybe worthwhile, but fundamentally tedious exercise. Can’t such
problems be automated via computer search? On the contrary, there are specific aspects of
reversible computation that make this classification problem both unusually rich, and unusually
hard to reduce to any finite number of cases.
We already discussed the astronomical number of possible invariants that even a tiny reversible
gate (say, a 3-bit gate) might satisfy, and the hopelessness of enumerating them by brute force.
However, even if we could cut down the number of invariants to something reasonable, there
would still be the problem that the size, n, of a reversible gate can be arbitrarily large—and as
one considers larger gates, one can discover more and more invariants. Indeed, that is precisely
what happens in our case, since the Hamming weight mod k invariant can only be “noticed” by
considering gates on k bits or more. There are also “sporadic” affine classes that can only be found
by considering 6-bit gates.
Of course, it is not hard just to guess a large number of reversible gate classes (affine transfor-
mations, parity-preserving and parity-flipping transformations, etc.), prove that these classes are
all distinct, and then prove that each one can be generated by a simple set of gates (e.g., CNOT or
Fredkin + NOT). Also, once one has a sufficiently powerful gate (say, the CNOT gate), it is often
straightforward to classify all the classes containing that gate. So for example, it is relatively easy
to show that CNOT, together with any non-affine gate, generates all reversible transformations.
As usual with classification problems, the hard part is to rule out exotic additional classes: most
of the work, one might say, is not about what is there, but about what isn’t there. It is one thing
to synthesize some random 1000-bit reversible transformation using only Toffoli gates, but quite
another to synthesize a Toffoli gate using only the random 1000-bit transformation!
Thinking about this brings to the fore the central issue: that in reversible computation, it is
not enough to output some desired string F (x); one needs to output nothing else besides F (x).
And hence, for example, it does not suffice to look inside the random 1000-bit reversible gate G,
to show that it contains a NAND gate, which is computationally universal. Rather, one needs to
deal with all of G’s outputs, and show that one can eliminate the undesired ones.
The way we do that involves another characteristic property of reversible circuits: that they
can have “global conserved quantities,” such as Hamming weight. Again and again, we need to
prove that if a reversible gate G fails to conserve some quantity, such as the Hamming weight mod
k, then that fact alone implies that we can use G to implement a desired behavior. This is where
elementary algebra and number theory come in.
There are two aspects to the problem. First, we need to understand something about the
possible quantities that a reversible gate can conserve. For example, we will need the following
three results:
• No reversible gate can change Hamming weight mod k by a fixed, nonzero amount, unless
k = 2.
8
We prove each of these statements in Section 6, using arguments based on complex polynomi-
als. In Appendix 15, we give alternative, more “combinatorial” proofs for the second and third
statements.
Next, using our knowledge about the possible conserved quantities, we need procedures that
take any gate G that fails to conserve some quantity, and that use G to implement a desired
behavior (say, making a single copy of a bit, or changing an inner product by exactly 1). We then
leverage that behavior to generate a desired gate (say, a Fredkin gate). The two core tasks turn
out to be the following:
• Given any non-affine gate, we need to construct a Fredkin gate. We do this in Sections 8.3
and 8.4.
• Given any non-orthogonal linear gate, we need to construct a CNOTNOT gate, a parity-
preserving version of CNOT that maps x, y, z to x, y ⊕ x, z ⊕ x. We do this in Section
9.2.
In both of these cases, our solution involves 3-dimensional lattices: that is, subsets of Z3 closed
under integer linear combinations. We argue, in essence, that the only possible obstruction to
the desired behavior is a “modularity obstruction,” but the assumption about the gate G rules out
such an obstruction.
We can illustrate this with an example that ends up not being needed in the final classification
proof, but that we worked out earlier in this research.5 Let G be any gate that does not conserve
(or anti-conserve) the Hamming weight mod k for any k ≥ 2, and suppose we want to use G to
construct a CNOT gate.
(1,0) (2,0)
Then we examine how G behaves on restricted inputs: in this case, on inputs that consist entirely
of some number of copies of x and x, where x ∈ {0, 1} is a bit, as well as constant 0 and 1 bits.
5
In general, after completing the classification proof, we were able to go back and simplify it substantially, by
removing results—for example, about the generation of CNOT gates—that were important for working out the lattice
in the first place, but which then turned out to be subsumed (or which could be subsumed, with modest additional
effort) by later parts of the classification. Our current proof reflects these simplifications.
9
For example, perhaps G can increase the number of copies of x by 5 while decreasing the number
of copies of x by 7, and can also decrease the number of copies of x by 6 without changing the
number of copies of x. Whatever the case, the set of possible behaviors generates some lattice: in
this case, a lattice in Z2 (see Figure 1). We need to argue that the lattice contains a distinguished
point encoding the desired “copying” behavior. In the case of the CNOT gate, the point is (1, 0),
since we want one more copy of x and no more copies of x. Showing that the lattice contains
(1, 0), in turn, boils down to arguing that a certain system of Diophantine linear equations must
have a solution. One can do this, finally, by using the assumption that G does not conserve or
anti-conserve the Hamming weight mod k for any k.
To generate the Fredkin gate, we instead use the Chinese Remainder Theorem to combine gates
that change the inner product mod p for various primes p into a gate that changes the inner product
between two inputs by exactly 1; while to generate the CNOTNOT gate, we exploit the assumption
that our generating gates are linear. In all these cases, it is crucial that we know, from Section 6,
that certain quantities cannot be conserved by any reversible gate.
There are a few parts of the classification proof (for example, Section 9.4, on affine gate sets)
that basically do come down to enumerating cases, but we hope to have given a sense for the
interesting parts.
10
2 Notation and Definitions
F2 means the field of 2 elements. [n] means {1, . . . , n}. We denote by e1 , . . . , en the standard
basis for the vector space Fn2 : that is, e1 = (1, 0, . . . , 0), etc.
Let x = x1 . . . xn be an n-bit string. Then x means x with all n of its bits inverted. Also, x ⊕ y
means bitwise XOR, x, y or xy means concatenation, xk means the concatenation of k copies of x,
and |x| means the Hamming weight. The parity of x is |x| mod 2. The inner product of x and y
is the integer x · y = x1 y1 + · · · + xn yn . Note that
x · (y ⊕ z) ≡ x · y + x · z (mod 2) ,
but the above need not hold if we are not working mod 2.
By gar (x), we mean garbage depending on x: that is, “scratch work” that a reversible compu-
tation generates along the way to computing some desired function f (x). Typically, the garbage
later needs to be uncomputed. Uncomputing, a term introduced by Bennett [7], simply means
running an entire computation in reverse, after the output f (x) has been safely stored.
2.1 Gates
By a (reversible) gate, throughout this paper we will mean a reversible transformation G on the
set of k-bit strings: that is, a permutation of {0, 1}k , for some fixed k. Formally, the terms
‘gate’ and ‘reversible transformation’ will mean the same thing; ‘gate’ just connotes a reversible
transformation that is particularly small or simple.
A gate is nontrivial if it does something other than permute its input bits, and non-degenerate
if it does something other than permute its input bits and/or apply NOT’s to some subset of them.
A gate G is conservative if it satisfies |G (x)| = |x| for all x. A gate is mod-k-respecting if there
exists a j such that
|G (x)| ≡ |x| + j (mod k)
for all x. It’s mod-k-preserving if moreover j = 0. It’s mod-preserving if it’s mod-k-preserving for
some k ≥ 2, and mod-respecting if it’s mod-k-respecting for some k ≥ 2.
As special cases, a mod-2-respecting gate is also called parity-respecting, a mod-2-preserving
gate is called parity-preserving, and a gate G such that
for all x is called parity-flipping. In Theorem 12, we will prove that parity-flipping gates are the
only examples of mod-respecting gates that are not mod-preserving.
The respecting number of a gate G, denoted k (G), is the largest k such that G is mod-k-
respecting. (By convention, if G is conservative then k (G) = ∞, while if G is non-mod-respecting
then k (G) = 1.) We have the following fact:
11
A gate G is affine if it implements an affine transformation over F2 : that is, if there exists an
invertible matrix A ∈ F2k×k , and a vector b ∈ Fk2 , such that G (x) = Ax ⊕ b for all x. A gate is
linear if moreover b = 0. A gate is orthogonal if it satisfies
for all x, y. (We will observe, in Lemma 14, that every orthogonal gate is linear.) Also, if
G (x) = Ax ⊕ b is affine, then the linear part of G is the linear transformation G′ (x) = Ax. We
call G orthogonal in its linear part, mod-k-preserving in its linear part, etc. if G′ satisfies the
corresponding invariant. A gate that is orthogonal in its linear part is also called an isometry.
Given two gates G and H, their tensor product, G ⊗ H, is a gate that applies G and H to
disjoint sets of bits. We will often use the tensor product to produce a single gate that combines
the properties of two previous gates. Also, we denote by G⊗t the tensor product of t copies of G.
(1) Base case. hSi contains S, as well as the identity function F (x1 . . . xn ) = x1 . . . xn for all
n ≥ 1.
(2) Composition rule. If hSi contains F (x1 . . . xn ) and G (x1 . . . xn ), then hSi also contains
F (G (x1 . . . xn )).
(3) Swapping rule. If hSi contains F (x1 . . . xn ), then hSi also contains all possible functions
σ F xτ (1) . . . xτ (n) obtained by permuting F ’s input and output bits.
(4) Extension rule. If hSi contains F (x1 . . . xn ), then hSi also contains the function
G (x1 . . . xn , b) := (F (x1 . . . xn ) , b) ,
for some smaller function G and fixed “ancilla” string a1 . . . ak ∈ {0, 1}k that do not depend
on x, then hSi also contains G. (Note that, if the ai ’s are set to other values, then F need
not have the above form.)
Note that because of reversibility, the set of n-bit transformations in hSi (for any n) always forms
a group. Indeed, if hSi contains F , then clearly hSi contains all the iterates F 2 (x) = F (F (x)),
etc. But since there must be some positive integer m such that F m (x) = x, this means that
F m−1 (x) = F −1 (x). Thus, we do not need a separate rule stating that hSi is closed under
inverses.
12
We say S generates the reversible transformation F if F ∈ hSi. We also say that S generates
hSi. If hSi equals the set of all permutations of {0, 1}n , for all n ≥ 1, then we call S universal.
Given an arbitrary set C of reversible transformations, we call C a reversible gate class (or class
for short) if C is closed under rules (2)-(5) above: in other words, if there exists an S such that
C = hSi.
A reversible circuit for the function F , over the gate set S, is an explicit procedure for generating
F by applying gates in S, and thereby showing that F ∈ hSi. An example is shown in Figure 2.
Reversible circuit diagrams are read from left to right, with each bit that occurs in the circuit (both
input and ancilla bits) represented by a horizontal line, and each gate represented by a vertical line.
If every gate G ∈ S satisfies some invariant, then we can also describe S and hSi as satisfying
that invariant. So for example, the set {CNOTNOT, NOT} is affine and parity-respecting, and so
is the class that it generates. Conversely, S violates an invariant if any G ∈ S violates it.
Just as we defined the respecting number k (G) of a gate, we would like to define the respecting
number k (S) of an entire gate set. To do so, we need a proposition about the behavior of k (G)
under tensor products.
x1 • •
x2 × ×
x3 ×
x4 ×
0 × • ×
Figure 2: Generating a Controlled-Controlled-Swap gate from Fredkin
13
which H is consistent. As an example, COPY is the 2-bit partial reversible gate defined by the
following relations:
COPY (00) = 00, COPY (10) = 11.
If a gate set S can implement the above behavior, using ancilla bits that are returned to their
original states by the end, then we say S “generates COPY”; the behavior on inputs 01 and 11 is
irrelevant. Note that COPY is consistent with CNOT. One can think of COPY as a bargain-
basement CNOT, but one that might be bootstrapped up to a full CNOT with further effort.
Generation With Garbage. Let D ⊆ {0, 1}m , and H : D → {0, 1}n be some function, which
need not be injective or surjective, or even have the same number of input and output bits. Then we
say that a reversible gate set S generates H with garbage if there exists a reversible transformation
G ∈ hSi, as well as an ancilla string a and a function gar, such that G (x, a) = (H (x) , gar (x)) for
all x ∈ D. As an example, consider the ordinary 2-bit AND function, from {0, 1}2 to {0, 1}. Since
AND destroys information, clearly no reversible gate can generate it in the usual sense, but many
reversible gates can generate AND with garbage: for instance, the Toffoli and Fredkin gates, as we
saw in Section 1.1.
Encoded Universality. This is a concept borrowed from quantum computing [4]. In our
setting, encoded universality means that there is some way of encoding 0’s and 1’s by longer
strings, such that our gate set can implement any desired transformation on the encoded bits.
Note that, while this is a weaker notion of universality than the ability to generate arbitrary
permutations of {0, 1}n , it is stronger than “merely” computational universality, because it still
requires a transformation to be performed reversibly, with no garbage left around. Formally, given
a reversible gate set S, we say that S supports encoded universality if there are k-bit strings α (0)
and α (1) such that for every n-bit reversible transformation F (x1 . . . xn ) = y1 . . . yn , there exists
a transformation G ∈ hSi that satisfies
G (α (x1 ) . . . α (xn )) = α (y1 ) . . . α (yn )
for all x ∈ {0, 1}n . Also, we say that S supports affine encoded universality if this is true for every
affine F .
As a well-known example, the Fredkin gate is not universal in the usual sense, because it
preserves Hamming weight. But it is easy to see that Fredkin supports encoded universality,
using the so-called dual-rail encoding, in which every 0 bit is encoded as 01, and every 1 bit is
encoded as 10. In Section 4.4, we will show, as a consequence of our classification theorem, that
every reversible gate set (except for degenerate sets) supports either encoded universality or affine
encoded universality.
Loose Generation. Finally, we say that a gate set S loosely generates a reversible transfor-
mation F : {0, 1}n → {0, 1}n , if there exists a transformation G ∈ hSi, as well as ancilla strings a
and b, such that
G (x, a) = (F (x) , b)
for all x ∈ {0, 1}n . In other words, G is allowed to change the ancilla bits, so long as they change
in a way that is independent of the input x. Under this rule, one could perhaps tell by examining
the ancilla bits that G was applied, but one could not tell to which input. This suffices for some
applications of reversible computing, though not for others.7
7
For example, if G were applied to a quantum superposition, then it would still maintain coherence among all the
inputs to which it was applied—though perhaps not between those inputs and other inputs in the superposition to
which it was not applied.
14
3 Stating the Classification Theorem
In this section we state our main result, and make a few preliminary remarks about it. First let
us define the gates that appear in the classification theorem.
• NOTNOT, or NOT⊗2 , is the 2-bit gate that maps xy to xy. NOTNOT is a parity-preserving
variant of NOT.
• Toffoli (also called Controlled-Controlled-NOT, or CCNOT) is the 3-bit gate that maps x, y, z
to x, y, z ⊕ xy.
• Fredkin (also called Controlled-SWAP, or CSWAP) is the 3-bit gate that maps x, y, z to
x, y ⊕ x (y ⊕ z) , z ⊕ x (y ⊕ z). In other words, it swaps y with z if x = 1, and does nothing
if x = 0. Fredkin is conservative: it never changes the Hamming weight.
• Ck is a k-bit gate that maps 0k to 1k and 1k to 0k , and all other k-bit strings to themselves.
Ck preserves the Hamming weight mod k. Note that C1 = NOT, while C2 is equivalent to
NOTNOT, up to a bit-swap.
• Tk is a k-bit gate (for even k) that maps x to x if |x| is odd, or to x if |x| is even. A different
definition is
Tk (x1 . . . xk ) = (x1 ⊕ bx , . . . , xk ⊕ bx ) ,
where bx := x1 ⊕ · · · ⊕ xk . This shows that Tk is linear. Indeed, we also have
which shows that Tk is orthogonal. Note also that, if k ≡ 2 (mod 4), then Tk preserves
Hamming weight mod 4: if |x| is even then |Tk (x)| = |x|, while if |x| is odd then
• Fk is a k-bit gate (for even k) that maps x to x if |x| is even, or to x if |x| is odd. A different
definition is
15
We can now state the classification theorem.
Theorem 3 (Main Result) Every set of reversible gates generates one of the following classes:
10. Classes 1, 3, 7, 8, or 9 augmented by a NOTNOT gate (note: 7 and 8 become equivalent this
way).
11. Classes 1, 3, 6, 7, 8, or 9 augmented by a NOT gate (note: 7 and 8 become equivalent this
way).
Furthermore, all the above classes are distinct except when noted otherwise, and they fit together
in the lattice diagram shown in Figure 3.8
Let us make some comments about the structure of the lattice. The lattice has a countably
infinite number of classes, with the one infinite part given by the mod-k-preserving classes. The
mod-k-preserving classes are partially ordered by divisibility, which means, for example, that the
lattice is not planar.9 While there are infinite descending chains in the lattice, there is no infinite
ascending chain. This means that, if we start from some reversible gate class and then add new
gates that extend its power, we must terminate after finitely many steps with the class of all
reversible transformations.
In Appendix 13, we will prove that if we allow loose generation, then the only change to Theorem
3 is that every C + NOTNOT class collapses with the corresponding C + NOT class.
8
Let us mention that Fredkin + NOTNOT generates the class of all parity-preserving transformations, while
Fredkin + NOT generates the class of all parity-respecting transformations. We could have listed the parity-preserving
transformations as a special case of the mod-k-preserving transformations: namely, the case k = 2. If we had done
so, though, we would have had to include the caveat that Ck only generates all mod-k-preserving transformations
when k ≥ 3 (when k = 2, we also need Fredkin in the generating set). And in any case, the parity-respecting class
would still need to be listed separately.
9
For consider the graph with the integers 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 15, 18, 20, 21, 24, and 28 as its vertices,
and with an edge between each pair whose ratio is a prime. One can check that this graph contains K3,3 as a minor.
16
⊤
Fredkin
CNOT
+NOT
CNOTNOT
+NOT
MOD2 ···
F4 +
T6 +NOT
NOTNOT
MOD8 ···
NOT
T6 + T4 F4 .. ···
NOTNOT .
NOTNOT T6 Fredkin
Non-affine
⊥ Affine
Isometry
Degenerate
17
4 Consequences of the Classification
To illustrate the power of the classification theorem, in this section we use it to prove four general
implications for reversible computation. While these implications are easy to prove with the
classification in hand, we do not know how to prove any of them without it.
Corollary 4 Every reversible gate class C is finitely generated: that is, there exists a finite set S
such that C = hSi.
Proof. This is immediate for all the classes listed in Theorem 3, except the ones involving NOT
or NOTNOT gates. For classes of the form C = hG, NOTi or C = hG, NOTNOTi, we just need a
single gate G′ that is clearly generated by C, and clearly not generated by a smaller class. We can
then appeal to Theorem 3 to assert that G′ must generate C. For each of the relevant G’s—namely,
Fredkin, CNOTNOT, F4 , and T6 —one such G′ is the tensor product, G ⊗ NOT or G ⊗ NOTNOT.
We also wish to point out a non-obvious symmetry property that follows from the classification
theorem. Given an n-bit reversible transformation F , let F ∗ , or the dual of F , be F ∗ (x1 . . . xn ) :=
F (x1 . . . xn ). The dual can be thought of as F with the roles of 0 and 1 interchanged: for example,
Toffoli∗ (xyz) flips z if and only if x = y = 0. Also, call a gate F self-dual if F ∗ = F , and call a
reversible gate class C dual-closed if F ∗ ∈ C whenever F ∈ C. Then:
Proof. This is obvious for all the classes listed in Theorem 3 that include a NOT or NOTNOT gate.
For the others, we simply need to consider the classes one by one: the notions of “conservative,”
“mod-k-respecting,” and “mod-k-preserving” are manifestly the same after we interchange 0 and 1.
This is less manifest for the notion of “orthogonal,” but one can check that Tk and Fk are self-dual
for all even k.
18
Proof. It suffices to give a linear-time algorithm that takes as input the truth table of a single
reversible transformation G : {0, 1}n → {0, 1}n , and that decides which class it generates. For we
can then compute hG1 , . . . , GK i by taking the least upper bound of hG1 i , . . . , hGK i, and can also
solve the membership problem by checking whether
hG1 , . . . , GK i = hG1 , . . . , GK , Hi .
The algorithm is as follows: first, make a single pass through G’s truth table, in order to answer
the following two questions.
In any reasonable RAM model, both questions can easily be answered in O (n2n ) time, which
is the number of bits in G’s truth table.
If G is non-affine, then Theorem 3 implies that we can determine hGi from W (G) alone. If G is
affine, then Theorem 3 implies we can determine hGi from (A, b) alone, though it is also convenient
to use W (G). We need to take the gcd of the numbers in W (G), check whether A is orthogonal,
etc., but the time needed for these operations is only poly (n), which is negligible compared to the
input size of n2n .
We have implemented the algorithm described in Theorem 7, and Java code is available for
download [24].
Theorem 8 Let R be a reversible circuit, over any gate set S, that maps {0, 1}n to {0, 1}n , using
an unlimited number of gates and ancilla bits. Then there is another reversible circuit, over the
same gate set S, that applies the same transformation as R does, and that uses only 2n poly(n)
gates and O(1) ancilla bits.10
Proof. If S is one of the gate sets listed in Theorem 3, then this follows immediately by examining
the reversible circuit constructions in Section 7, for each class in the classification. Building, in
relevant parts, on results by others [25, 6], we will take care in Section 7 to ensure that each non-
affine circuit construction uses at most 2n poly(n) gates and O(1) ancilla bits, while each affine
construction uses at most O(n2 ) gates and O(1) ancilla bits (most actually use no ancilla bits).
Now suppose S is not one of the sets listed in Theorem 3, but some other set that generates
one of the listed classes. So for example, suppose hSi = hFredkin, NOTi. Even then, we know
that S generates Fredkin and NOT, and the number of gates and ancillas needed to do so is just
some constant, independent of n. Furthermore, each time we need a Fredkin or NOT, we can reuse
the same ancilla bits, by the assumption that those bits are returned to their original states. So
we can simply simulate the appropriate circuit construction from Section 7, using only a constant
factor more gates and O (1) more ancilla bits than the original construction.
10
Here the big-O’s suppress constant factors that depend on the gate set in question.
19
As we said in Section 1.4, without the classification theorem, it is not obvious how to prove any
upper bound whatsoever on the number of gates or ancillas, for arbitrary gate sets S. Of course,
any circuit that uses T gates also uses at most O (T ) ancillas; and conversely, any circuit that uses
M ancillas needs at most 2n+M ! gates, for counting reasons. But the best upper bounds on
either quantity that follow from clone theory and the ideal membership problem appear to have
the form exp (exp (exp (exp (n)))).
A constant number of ancilla bits is sometimes needed, and not only for the trivial reasons that
our gates might act on more than n bits, or only (e.g.) be able to map 0n to 0n if no ancillas are
available.
Proposition 9 (Toffoli [28]) If no ancillas are allowed, then there exist reversible transforma-
tions of {0, 1}n that cannot be generated by any sequence of reversible gates on n − 1 bits or fewer.
Proof. For all k ≥ 1, any (n − k)-bit gate induces an even permutation of {0, 1}n —since each
cycle is repeated 2k times, once for every setting of the k bits on which the gate doesn’t act. But
there are also odd permutations of {0, 1}n .
It is also easy to show, using a Shannon counting argument, that there exist n-bit reversible
transformations that n
2
require Ω (2 ) gates to implement, and n-bit affine transformations that re-
quire Ω n / log n gates. Thus the bounds in Theorem 8 on the number of gates T are, for each
class, off from the optimal bounds only by polylog T factors.
Theorem 10 Besides the trivial, NOT, and NOTNOT classes, every reversible gate class supports
encoded universality if non-affine, or affine encoded universality if affine.
Proof. For hFredkini, and for all the non-affine classes above hFredkini, we use the so-called “dual-
rail encoding,” where 0 is encoded by 01 and 1 is encoded by 10. Given three encoded bits, xxyyzz,
we can simulate a Fredkin gate by applying one Fredkin to xyz and another to xyz, and can also
simulate a CNOT by applying a Fredkin to xyy. But Fredkin + CNOT generates everything.
The dual-rail encoding also works for simulating all affine transformations using an F4 gate.
For note that
F4 (xyy1) = (1, x ⊕ y, x ⊕ y, x)
= (x, x ⊕ y, x ⊕ y, 1) ,
11
This was proven by Lloyd [19], as well as by Kerntopf et al. [13] and De Vos and Storme [29]; we include a proof
for completeness in Section 8.2.
20
where we used that we can permute bits for free. So given two encoded bits, xxyy, we can simulate
a CNOT from x to y by applying F4 to x, y, y, and one ancilla bit initialized to 1.
For hCNOTNOTi, we use a repetition encoding, where 0 is encoded by 00 and 1 is encoded by
11. Given two encoded bits, xxyy, we can simulate a CNOT from x to y by applying a CNOTNOT
from either copy of x to both copies of y. This lets us perform all affine transformations on the
encoded subspace.
The repetition encoding also works for hT4 i. For notice that
T4 (xyy0) = (0, x ⊕ y, x ⊕ y, x)
= (x, x ⊕ y, x ⊕ y, 0) .
Thus, to simulate a CNOT from x to y, we use one copy of x, both copies of y, and one ancilla bit
initialized to 0.
Finally, for hT6 i, we encode 0 by 0011 and 1 by 1100. Notice that
T6 (xyyyy0) = (0, x ⊕ y, x ⊕ y, x ⊕ y, x ⊕ y, x)
= (x, x ⊕ y, x ⊕ y, x ⊕ y, x ⊕ y, 0) .
So given two encoded bits, xxxxyyyy, we can simulate a CNOT from x to y by using one copy of
x, all four copies of y and y, and one ancilla bit initialized to 0.
In the proof of Theorem 10, notice that, every time we simulated Fredkin (xyz) or CNOT (xy),
we had to examine only a single bit in the encoding of the control bit x. Thus, Theorem 10 actually
yields a stronger consequence: that given an ordinary, unencoded input string x1 . . . xn , we can use
any non-degenerate reversible gate first to translate x into its encoded version α (x1 ) . . . α (xn ), and
then to perform arbitrary transformations or affine transformations on the encoding.
21
Finally, we need to show that there are no additional reversible gate classes, besides the ones
listed in Theorem 3. This is by far the most interesting part, and occupies the majority of the
paper. The organization is as follows:
• In Section 6, we collect numerous results about what reversible transformations can and
cannot do to Hamming weights mod k and inner products mod k, in both the affine and the
non-affine cases; these results are then drawn on in the rest of the paper. (Some of them are
even used for the circuit constructions in Section 7.)
• In Section 8, we complete the classification of all non-affine gate sets. In Section 8.1, we show
that the only classes that contain a Fredkin gate are hFredkini itself, hFredkin, NOTNOTi,
hFredkin, NOTi, hCk i for k ≥ 3, and hToffolii. Next, in Section 8.3, we show that every
nontrivial conservative gate generates Fredkin. Then, in Section 8.4, we build on the result
of Section 8.4 to show that every non-affine gate set generates Fredkin.
• In Section 9, we complete the classification of all affine gate sets. For simplicity, we start
with linear gate sets only. In Section 9.1, we show that every nontrivial mod-4-preserving
linear gate generates T6 , and that every nontrivial, non-mod-4-preserving orthogonal gate
generates T4 . Next, in Section 9.2, we show that every non-orthogonal linear gate generates
CNOTNOT. Then, in Section 9.3, we show that every non-parity-preserving linear gate gen-
erates CNOT. Since CNOT generates all linear transformations, completes the classification
of linear gate sets. Finally, in Section 9.4, we “put back the affine part,” showing that it can
lead to only 8 additional classes besides the linear classes h∅i, hT6 i, hT4 i, hCNOTNOTi, and
hCNOTi.
Proof. In each case, one just needs to observe that the gate that generates a given class A, satisfies
some invariant violated by the gate that generates another class B. (Here we are using the “gate
definitions” of the classes, which will be proven equivalent to the invariant definitions in Section
7.) So for example, hFredkini cannot contain CNOT because Fredkin is conservative; conversely,
hCNOTi cannot contain Fredkin because CNOT is affine.
The only tricky classes are those involving NOT and NOTNOT gates: indeed, these classes do
sometimes coincide, as noted in Theorem 3. However, in all cases where the classes are distinct,
their distinctness is witnessed by the following invariants:
• hFredkin, NOTi and hFredkin, NOTNOTi are conservative in their linear part.
• hF4 , NOTi = hT4 , NOTi and hF4 , NOTNOTi = hT4 , NOTNOTi are orthogonal in their linear
part (isometries).
• hT6 , NOTi and hT6 , NOTNOTi are orthogonal and mod-4-preserving in their linear part.
As a final remark, even if a reversible transformation is implemented with the help of ancilla
bits, as long as the ancilla bits start and end in the same state a1 . . . ak , they have no effect on any
of the invariants discussed above, and for that reason are irrelevant.
22
6 Hamming Weights and Inner Products
The purpose of this section is to collect various mathematical results about what a reversible
transformation G : {0, 1}n → {0, 1}n can and cannot do to the Hamming weight of its input, or to
the inner product of two inputs. That is, we study the possible relationships that can hold between
|x| and |G (x)|, or between x · y and G (x) · G (y) (especially modulo various positive integers k).
Not only are these results used heavily in the rest of the classification, but some of them might be
of independent interest.
Theorem 12 There are no mod-shifters for k ≥ 3. In other words: let G be a reversible transfor-
mation on n-bit strings, and suppose
Proof. Suppose the above equation holds for all x. Then introducing a new complex variable z,
we have
z |G(x)| ≡ z |x|+j mod z k − 1
(since working mod z k − 1 is equivalent to setting z k = 1). Since the above is true for all x,
X X
z |G(x)| ≡ z |x| z j mod z k − 1 . (1)
x∈{0,1}n x∈{0,1}n
By reversibility, we have
z |x| = (z + 1)n .
X X
z |G(x)| =
n n
x∈{0,1} x∈{0,1}
Now, since z k −1 has no repeated roots, it can divide (z + 1)n z j − 1 only if it divides (z + 1) z j − 1 .
holds only if k = 2.
In Appendix 15, we provide an alternative proof of Theorem 12, using linear algebra. The
alternative proof is longer, but perhaps less mysterious.
23
6.2 Inner Products Mod k
We have seen that there exist orthogonal gates (such as the Tk gates), which preserve inner products
mod 2. In this section, we first show that no reversible gate that changes Hamming weights can
preserve inner products mod k for any k ≥ 3. We then observe that, if a reversible gate is
orthogonal, then it must be linear, and we give necessary and conditions for orthogonality.
Proof. As in the proof of Theorem 12, we promote the congruence to a congruence over complex
polynomials:
z G(x)·G(y) ≡ z x·y mod z k − 1
Fix a string x ∈ {0, 1}n such that |G(x)| > |x|, which must exist because G is non-conservative.
Then sum the congruence over all y:
X X
z G(x)·G(y) ≡ z x·y mod z k − 1 .
y∈{0,1}n y∈{0,1}n
Similarly,
z G(x)·G(y) = (1 + z)|G(x)| 2n−|G(x)| ,
X
y∈{0,1}n
since summing over all y is the same as summing over all G (y). So we have
(1 + z)|G(x)| 2n−|G(x)| ≡ (1 + z)|x| 2n−|x| mod z k − 1 ,
0 ≡ (1 + z)|x| 2n−|G(x)| 2|G(x)|−|x| − (1 + z)|G(x)|−|x| mod z k − 1 ,
or equivalently, letting
p (x) := 2|G(x)|−|x| − (1 + z)|G(x)|−|x| ,
we find that z k − 1 divides (1 + z)|x| p (x) as a polynomial. Now, the roots of z k − 1 lie on the unit
circle centered at 0. Meanwhile, the roots of p (x) lie on the circle in the complex plane of radius
2, centered at −1. The only point of intersection of these two circles is z = 1, so that is the only
root of z k − 1 that can be covered by p (x). On the other hand, clearly z = −1 is the only root of
(1 + z)|x| . Hence, the only roots of z k − 1 are 1 and −1, so we conclude that k = 2.
We now study reversible transformations that preserve inner products mod 2.
24
Proof. Suppose
G (x) · G (y) ≡ x · y (mod 2) .
Then for all x, y, z,
G (x ⊕ y) · G (z) ≡ (x ⊕ y) · z
≡x·z+y·z
≡ G (x) · G (z) + G (y) · G (z)
≡ (G (x) ⊕ G (y)) · G (z) (mod 2) .
Corollary 15 Let G be any non-conservative, nonlinear gate. Then for all k ≥ 2, there exist
inputs x, y such that
G (x) · G (y) 6≡ x · y (mod k) .
Also:
Proof. This is just the standard characterization of orthogonal matrices; that we are working over
F2 is irrelevant. First, if G preserves inner products mod 2 then for all i 6= j,
X ^
|v1 ⊕ · · · ⊕ vt | = (−2)|S|−1 vi .
∅⊂S⊆[t] i∈S
25
Proof. It suffices to prove the lemma for n = 1, since in the general case we are just summing over
all i ∈ [n]. Thus, assume without loss of generality that v1 = · · · = vt = 1. Our problem then
reduces to proving the following identity:
t
X t 0 if t is even
(−2)i−1 =
i 1 if t is odd,
i=1
Proof. Let G (x) = Ax ⊕ b; then |G (0n )| = |0n | = 0 implies b = 0n . Likewise, |G (ei )| = |ei | = 1
for all i implies that A is a permutation matrix. But then G is trivial.
Theorem 19 If G is a nontrivial linear gate that preserves Hamming weight mod k, then either
k = 2 or k = 4.
|x| + |y| − 2 (x · y) ≡ |x ⊕ y|
≡ |G (x ⊕ y)|
≡ |G (x) ⊕ G (y)|
≡ |G (x)| + |G (y)| − 2 (G (x) · G (y))
≡ |x| + |y| − 2 (G (x) · G (y)) (mod k) ,
where the first and fourth lines used Lemma 17, the second and fifth lines used that G is mod-k-
preserving, and the third line used linearity. Hence
But since G is nontrivial and linear, Lemma 18 says that G is non-conservative. So by Theorem
13, the above equation cannot be satisfied for any odd k ≥ 3. Likewise, if k is even, then (2)
implies
k
x · y ≡ G (x) · G (y) mod .
2
Again by Theorem 13, the above can be satisfied only if k = 2 or k = 4.
In Appendix 15, we provide an alternative proof of Theorem 19, one that does not rely on
Theorem 13.
Theorem 20 Let {oi }ni=1 be an orthonormal basis over F2 . An affine transformation F (x) = Ax⊕b
is mod-4-preserving if and only if |b| ≡ 0 (mod 4), and the vectors vi := Aoi satisfy |vi | + 2 (vi · b) ≡
|oi | (mod 4) for all i and vi · vj ≡ 0 (mod 2) for all i 6= j.
26
Proof. First, if F is mod-4-preserving, then
0 ≡ |F (0n )| ≡ |A0n ⊕ b| ≡ |b| (mod 4) ,
and hence
|oi | ≡ |F (oi )| ≡ |Aoi ⊕ b| ≡ |vi ⊕ b| ≡ |vi | + |b| − 2 (vi · b) ≡ |vi | + 2 (vi · b) (mod 4)
for all i, and hence
|oi + oj | ≡ |F (oi ⊕ oj )| ≡ |vi ⊕ vj ⊕ b| ≡ |vi | + |vj | + |b| − 2 (vi · vj ) − 2 (vi · b) − 2 (vj · b) + 4 |vi ∧ vj ∧ b|
≡ |vi | + |vj | + 2 (vi · vj ) + 2 (vi · b) + 2 (vj · b) (mod 4)
≡ |oi | + |oj | + 2 (vi · vj ) (mod 4)
for all i 6= j, from which we conclude that vi · vj ≡ 0 (modP
2).
Second, if F satisfies the conditions, then for any x = i∈S oi , we have
X
|F (x)| = b ⊕ vi
i∈S
X X X
= |b| + |vi | − 2 (b · vi ) − 2 (vi · vj ) + 4(· · · )
i∈S i∈S i∈S < j∈S
X
≡ |vi | − 2 (b · vi )
i∈S
X
≡ |oi | (mod 4) ,
i∈S
where the second line follows from Lemma 17. Furthermore, we have that
X X X X
|x| = oi = |oi | − 2 (oi · oj ) + 4(. . .) ≡ |oi | (mod 4) ,
i∈S i∈S i∈S<j∈S i∈S
where the last equality follows from the fact that {oi }ni=1 is an orthonormal basis. Therefore, we
conclude that |F (x)| ≡ |x| (mod 4).
We note two corollaries of Theorem 20 for later use.
Corollary 21 Any linear transformation A ∈ F2n×n that preserves Hamming weight mod 4 is also
orthogonal.
Corollary 22 An orthogonal transformation A ∈ F2n×n preserves Hamming weight mod 4 if and
only if all of its columns have Hamming weight 1 mod 4.
27
7.1 Non-Affine Circuits
We start with the non-affine classes: hToffolii, hFredkini, hFredkin, Ck i, and hFredkin, NOTi.
Theorem 23 (variants in [28, 25]) Toffoli generates all reversible transformations on n bits,
using only 2 ancilla bits.13
Proof. Any reversible transformation F : {0, 1}n → {0, 1}n is a permutation of n-bit strings,
and any permutation can be written as a product of transpositions. So it suffices to show how
to use Toffoli gates to implement an arbitrary transposition σy,z : that is, a mapping that sends
y = y1 . . . yn to z = z1 . . . zn and z to y, and all other n-bit strings to themselves.
Given any n-bit string w, let us define w-CNOT to be the (n + 1)-bit gate that flips its last
bit if its first n bits are equal to w, and that does nothing otherwise. (Thus, the Toffoli gate is
11-CNOT, while CNOT itself is 1-CNOT.) Given y-CNOT and z-CNOT gates, we can implement
the transposition σy,z as follows on input x:
Thus, all that remains is to implement w-CNOT using Toffoli. Observe that we can simulate
any w-CNOT using 1n -CNOT, by negating certain input bits (namely, those for which wi = 0)
before and after we apply the 1n -CNOT. An example of the transposition σ011,101 is given in
Figure 4.
x1 N • N • • N •
x2 • N • • N • N
x3 • • • •
a=1 • •
Figure 4: Generating the transposition σ011,101
So it suffices to implement 1n -CNOT, with control bits x1 . . . xn and target bit y. The base
case is n = 2, which we implement directly using Toffoli. For n ≥ 3, we do the following.
• Let a be an ancilla.
13
Notice that we need at least 2 so that we can generate CNOT and NOT using Toffoli.
28
• Apply 1⌈n/2⌉ -CNOT x1 . . . x⌈n/2⌉ , a .
The crucial point is that this construction works whether the ancilla is initially 0 or 1. In other
words, we can use any bit which is not one of the inputs, instead of a new ancilla. For instance, we
can have one bit dedicated for use in 1n -CNOT gates, which we use in the recursive applications
of 1⌈n/2⌉ -CNOT and 1⌊n/2⌋+1 -CNOT, and the recursive applications within them, and so on.14
2 n
Carefully inspecting the above proof shows that O n 2 gates and 3 ancilla bits suffice to
generate any transformation. Notice the main reason we need two of the three ancillas is to apply
the NOT gate while the ancilla a is active. Case analysis shows that any circuit constructible from
NOT, CNOT, and Toffoli is equivalent to a circuit of NOT gates followed by a circuit of CNOT and
Toffoli gates. For example, see Figure 5. This at most triples the size of the circuit. Therefore,
we can construct a circuit that uses only two ancilla bits: apply the recursive construction, push
the NOT gates to the front, and use two ancilla bits to generate the NOT gates. The recursive
construction itself uses one ancilla bit, plus one more to implement CNOT.
• N N •
• = • •
Figure 5: Example of equivalent Toffoli circuit with NOT gates pushed to the front
The particular construction above was inspired by a result of Ben-Or and Cleve [6], in which
they compute algebraic formulas in a straight-line computation model using a constant number of
registers. We note that Toffoli [28] proved a version of Theorem 23, but with O (n) ancilla bits
rather than O (1). More recently, Shende et al. [25] gave a slightly more complicated construction
which uses only 1 ancilla bit, and also gives explicit bounds on the number of Toffoli gates required
based on the number of fixed points of the permutation. Recall that at least 1 ancilla bit is needed
by Proposition 9.
Next, let CCSWAP, or Controlled-Controlled-SWAP, be the 4-bit gate that swaps its last two
bits if its first two bits are both 1, and otherwise does nothing.
29
Theorem 25 Fredkin generates all conservative transformations on n bits, using only 5 ancilla
bits.
Proof. In this proof, we will use the dual-rail representation, in which 0 is encoded as 01 and 1 is
encoded as 10. We will also use Proposition 24, that Fredkin generates CCSWAP.
As in Theorem 23, we can decompose any reversible transformation F : {0, 1}n → {0, 1}n as
a product of transpositions σy,z . In this case, each σy,z transposes two n-bit strings y = y1 . . . yn
and z = z1 . . . zn of the same Hamming weight.
Given any n-bit string w, let us define w-CSWAP to be the (n + 2)-bit gate that swaps its last
two bits if its first n bits are equal to w, and that does nothing otherwise. (Thus, Fredkin is
1-CSWAP, while CCSWAP is 11-CSWAP.) Then given y-CSWAP and z-CSWAP gates, where
|y| = |z|, as well as CCSWAP gates, we can implement the transposition σy,z on input x as follows:
4. Pair off the i’s such that yi = 1 and zi = 0, with the equally many j’s such that zj = 1 and
yj = 0. For each such (i, j) pair, apply Fredkin (a, xi , xj ).
The logic here is exactly the same as in the construction of transpositions in Theorem 23; the
only difference is that now we need to conserve Hamming weight.
All that remains is to implement w-CSWAP using CCSWAP. First let us show how to imple-
ment 1n -CSWAP using CCSWAP. Once again, we do so using a recursive construction. For the
base case, n = 2, we just use CCSWAP. For n ≥ 3, we implement 1n -CSWAP (x1 , . . . , xn , y, z) as
follows:
The logic is the same as in the construction of 1n -CNOT in Theorem 23 except we now use 2
ancilla bits for the dual rail representation.
Finally, we need to implement w-CSWAP (x1 . . . xn , y, z), for arbitrary w, using 1n -CSWAP.
We do so by first constructing w-CSWAP from NOT gates and 1n -CSWAP. Observe that we only
use the NOT gate on the control bits of the Fredkin gates used during the construction so the
equivalence given in Figure 6 holds (i.e., we can remove the NOT gates).
30
N • N •
= × ×
× × ×
×
Figure 6: Removing NOT gates from the Fredkin circuit
Hence, we can build a w-CSWAP out of CCSWAPs using only 5 ancilla bits: 1 for CCSWAP,
2 for the 1n -CSWAP, and 2 for a transposition.
We note that, before the above construction was found by the authors, unpublished and inde-
pendent work by Siyao Xu and Qian Yu first showed that O(1) ancillas were sufficient.
In [10], the result that Fredkin generates all conservative transformations is stated without
proof, and credited to B. Silver. We do not know how many ancilla bits Silver’s construction used.
Next, we prove an analogue of Theorem 23 for the mod-k-respecting transformations, for all
k ≥ 2. First, let CCk , or Controlled-Ck , be the (k + 1)-bit gate that applies Ck to the final k bits
if the first bit is 1, and does nothing if the first bit is 0.
3. Apply Ck to y1 . . . yk .
4. Repeat step 2.
Theorem 27 Fredkin + CCk generates all mod-k-preserving transformations, for k ≥ 1, using only
5 ancilla bits.
Proof. The proof is exactly the same as that of Theorem 25, except for one detail. Namely, let y
and z be n-bit strings such that |y| ≡ |z| (mod k). Then in the construction of the transposition
σy,z from y-CSWAP and z-CSWAP gates, when we are applying step 5, it is possible that |y| − |z|
is some nonzero multiple of k, say qk. If so, then we can no longer pair off each i such that yi = 1
and zi = 0 with a unique j such that zj = 1 and yj = 0: after we have done that, there will remain
a surplus of ‘1’ bits of size qk, either in y or in z, as well as a matching surplus of ‘0’ bits of size qk
in the other string. However, we can get rid of both surpluses using q applications of a CCk gate
(which we have by Proposition 26), with c as the control bit.
As a special case of Theorem 27, note that Fredkin + CC1 = Fredkin + CNOT generates all
mod-1-preserving transformations—or in other words, all transformations.
We just need one additional fact about the Ck gate.
15
In more detail, use Fredkin gates to swap y1 , y2 with a, b, conditioned on x = 1. Then swap y1 , y2 with a, b
unconditionally.
31
Proposition 28 Ck generates Fredkin, using k − 2 ancilla bits, for all k ≥ 3.
Proof. Let a1 . . . ak−2 be ancilla bits initially set to 1. Then to implement Fredkin on input bits
x, y, z, we apply:
Ck (x, y, a1 . . . ak−2 ) ,
Ck (x, z, a1 . . . ak−2 ) ,
Ck (x, y, a1 . . . ak−2 ) .
Corollary 29 Ck generates all mod-k-preserving transformations for k ≥ 3, using only k+3 ancilla
bits.
Proof. This follows from Proposition 26, if we recall that C2 is equivalent to NOTNOT up to an
irrelevant bit-swap.
Theorem 31 Fredkin + NOT generates all parity-respecting transformations on n bits, using only
6 ancilla bits.
Theorem 32 CNOT generates all affine transformations, with only 1 ancilla bit (or 0 for linear
transformations).
Proof. Let G (x) = Ax ⊕ b be the affine transformation that we want to implement, for some
invertible matrix A ∈ F2n×n . Then given an input x = x1 . . . xn , we first use CNOT gates (at most
n
2 of them) to map x to Ax, by reversing the sequence of row-operations that maps A to the
identity matrix in Gaussian elimination. Finally, if b = b1 . . . bn is nonzero, then for each i such
that bi = 1, we apply a CNOT from an ancilla bit that is initialized to 1.
A simple modification of Theorem 32 handles the parity-preserving case.
Theorem 33 CNOTNOT generates all parity-preserving affine transformations with only 1 ancilla
bit (or 0 for linear transformations).
32
Proof. Let G (x) = Ax ⊕ b be a parity-preserving affine transformation. We first construct
the linear part of G using Gaussian elimination. Notice that for G to be parity-preserving, the
columns vi of A must satisfy |vi | ≡ 1 (mod 2) for all i. For this reason, the row-elimination steps
come in pairs, so we can implement them using CNOTNOT. Notice further that since G is parity-
preserving, we must have |b| ≡ 0 (mod 2). So we can map Ax to Ax ⊕ b, by using CNOTNOT
gates plus one ancilla bit set to 1 to simulate NOTNOT gates.
Likewise (though, strictly speaking, we will not need this for the proof of Theorem 3):
Proof. Use Theorem 33 to map x to Ax, and then use NOT gates to map Ax to Ax ⊕ b.
We now move on to the more complicated cases of hF4 i, hT6 i, and hT4 i.
Proof. Let F (x) = Ax⊕b be an n-bit affine transformation, n ≥ 2, that preserves Hamming weight
mod 4. Using F4 gates, we will show how to map F (x) = y1 . . . yn to x = x1 . . . xn . Reversing the
construction then yields the desired map from x to F (x).
At any point in time, each yj is some affine function of the xi ’s. We say that xi “occurs in”
yj , if yj depends on xi . At a high level, our procedure will consist of the following steps, repeated
up to n − 3 times:
3. Argue that no other xi′ can then occur in that yj . Therefore, we have recursively reduced our
problem to one involving a reversible, mod-4-preserving, affine function on n − 1 variables.
It is not hard to see that the only mod-4-preserving affine functions on 3 or fewer variables, are
permutations of the bits. So if we can show that the three steps above can always be carried out,
then we are done.
First, since A is invertible, it is not the all-1’s matrix, which means that there must be an xi
that does not occur in every yj .
Second, if there are at least three occurrences of xi , then apply F4 to three positions in which xi
occurs, plus one position in which xi does not occur. The result of this is to decrease the number
of occurrences of xi by 2. Repeat until there are at most two occurrences of xi . Since F4 is mod-4-
preserving and affine, the resulting transformation F ′ (x) = A′ x + b′ must still be mod-4-preserving
and affine, so it must still satisfy the conditions of Lemma 20. In particular, no column vector of
A′ can have even Hamming weight. Since two occurrences of xi would necessitate such a column
vector, we know that xi must occur only once.
Third, if xi occurs only once in F ′ (x), then the corresponding column vector vi has exactly
one nonzero element. Since |vi | = 1, we know by Lemma 20 that vi · b ≡ 0 (mod 2), which means
that b has a 0 in the position where vi has a 1. Now consider the row of A′ that includes the
nonzero entry of vi . If any other column vi′ is also nonzero in that row, then vi · vi′ ≡ 1 (mod 2),
which once again contradicts the conditions of Lemma 20. Thus, no other xi′ occurs in the same
33
yj that xi occurs in. Indeed no constant occurs there either, since otherwise F ′ would no longer
be mod-4-preserving. So we have reduced to the (n − 1) × (n − 1) case.
The same argument, with slight modifications, handles hT4 i and hT6 i.
Proof. The construction is identical to that of Theorem 35, except with T4 instead of F4 . When
reducing the number of occurrences of xi to at most 2, Lemma 16 assures us that |vi | ≡ 1 (mod 2).
Proof. The construction is identical to that of Theorem 35, except for the following change.
Rather than using F4 to reduce the number of occurrences of some xi to at most 2, we now use
T6 to reduce the number of occurrences of xi to at most 4. (If there are 5 or more occurrences,
then T6 can always decrease the number by 4.) We then appeal to Corollary 22, which says that
|vi | ≡ 1 (mod 4) for each i. This implies that no xi can occur 2, 3, or 4 times in the output vector.
But that can only mean that xi occurs once.
By Lemma 14 and Corollary 21, an equivalent way to state Theorem 37 is that T6 generates
all affine transformations that are both mod-4-preserving and orthogonal.
All that remains is some “cleanup work” (which, again, is not even needed for the proof of
Theorem 3).
Theorem 38 T6 + NOT generates all affine transformations that are mod-4-preserving (and there-
fore orthogonal) in their linear part.
T6 + NOTNOT generates all parity-preserving affine transformations that are mod-4-preserving
(and therefore orthogonal) in their linear part.
F4 + NOT (or equivalently, T4 + NOT) generates all isometries.
F4 + NOTNOT (or equivalently, T4 + NOTNOT) generates all parity-preserving isometries.
NOT generates all degenerate transformations.
NOTNOT generates all parity-preserving degenerate transformations.
In none of these cases are any ancilla bits needed.
Proof. As in Theorem 34, we simply apply the relevant construction for the linear part (e.g.,
Theorem 36 or 37), then handle the affine part using NOT or NOTNOT gates.
34
• In Section 8.2, we reprove a result of Lloyd [19], showing that every non-affine gate is capable
of universal computation with garbage.
• In Section 8.3, we show that every nontrivial conservative gate generates Fredkin (using the
result of Section 8.2 as one ingredient).
• In Section 8.4, we build on the result of Section 8.3, to show that every non-affine gate
generates Fredkin. This requires our first use of lattices, and also draws on some of the
results about inner products and modularity obstructions from Section 6.
Theorem 39 Every non-affine gate set generates one of the following classes: hFredkini, hCk i for
some k ≥ 3, hFredkin, NOTNOTi, hFredkin, NOTi, or hToffolii.
be the set of possible changes that G can cause to the Hamming weight of its input.
Proposition 40 Let Gbe any non-conservative gate. Then for all integers q, there exists a t such
that q · k (G) ∈ W G⊗t .
Proof. Let γ be the gcd of the elements in W (G). Then clearly G is mod-γ-respecting. By
Proposition 1, this means that γ must divide k (G).16
Now by reversibility, W (G) has both positive and negative elements. But this means that we
can find any integer multiple of γ in some set of the form
Therefore we can find any integer multiple of k (G) in some W G⊗t as well.
We can now characterize all reversible gate sets that contain Fredkin.
Theorem 41 Let G be any gate. Then Fredkin +G generates all mod-k (G)-preserving transfor-
mations (including in the cases k (G) = 1, in which case Fredkin +G generates all transformations,
and k (G) = ∞, in which case Fredkin +G generates all conservative transformations).
Proof. Let k = k (G). If k = ∞ then we are done by Theorem 25, so assume k is finite. We
will assume without loss of generality that G is mod-k-preserving. By Theorem 12, the only other
possibility is that G is parity-flipping, but in that case we can simply repeat everything below with
G ⊗ G, which is parity-preserving and satisfies k (G ⊗ G) = 2, rather than with G itself.
16
Indeed, by using Theorem 12, one can show that γ = k (G), except in the special case that G is parity-flipping,
where we have γ = 1 and k (G) = 2.
35
By Theorem 27, it suffices to use Fredkin +G to generate the CCk gate. Let H be the gate
G ⊗ G−1 , followed by a swap of the two input registers. Observe that H 2 is the identity. Also, by
Proposition 2,
k (H) = gcd k (G) , k G−1 = k.
So by Proposition 40, there exists a positive integer t,as well as inputs y = y1 . . . yn and z = z1 . . . zn
2
such that z = H ⊗t (y) (and y = H ⊗t (z), since H ⊗t = I), and |z| = |y| + k.
We can assume without loss of generality that y has the form 0a 1b —i.e., that its bits are in
sorted order. We would like to sort the bits of z as well. Notice that, since |z| > |y|, there is some
i ∈ [n] such that yi = 0 and zi = 1. So we can easily design a circuit U of Fredkin gates, controlled
by bit i, which reorders the bits of z so that
whereas U (y) = y.
Observe that H ⊗t has a large number of fixed points: we have H (u, G (u)) = (u, G (u)) for any
u; hence any string of the form u1 , G (u1 ) , . . . , ut , G (ut ) is a fixed point of H ⊗t . Call one of these
fixed points w, and let w′ := U (w).
We now consider a circuit R that applies U −1 , followed by H ⊗t , followed by U . This R satisfies
the following identities:
R z ′ = U H ⊗t U −1 z ′ = U H ⊗t (z) = U (y) = y.
R w′ = U H ⊗t U −1 w′ = U H ⊗t (w) = U (w) = w′ .
Using R, we now construct CCk (x1 . . . xk , c). Let A and B be two n-bit registers, initialized to
A := w′ and B := 0a−k x1 . . . xk 1b . Also, let qq be two ancilla bits in dual-rail representation,
initialized to qq = 01. Then to apply CCk , we do the following:
Here each conditional swap is implemented using Fredkin gates; recall from Theorem 25 that
Fredkin generates every conservative transformation.
It is not hard to check that the above sequence maps x1 . . . xk = 0k to 1k and x1 . . . xk = 1k to
k
0 if c = 1, otherwise it maps the inputs to themselves. Furthermore, the ancilla bits are returned
to their original states in all cases, since w′ is a fixed point of R. Therefore we have implemented
CCk .
Theorem 41 has the following corollary.
Corollary 42 Let S be any non-conservative gate set. Then Fredkin +S generates one of the
following classes: hFredkin, NOTNOTi, hFredkin, NOTi, hCk i for some k ≥ 3, or hToffolii.
36
Proof. We know from Proposition 2 that S generates a single gate G such that k (G) = k (S).
If k (S) ≥ 3, then Theorem 41 implies that Fredkin +G generates all k (S)-preserving transfor-
mations, which equals Ck(S) by Corollary 29. If k (S) = 2 and S is parity-preserving, then
Theorem 41 implies that Fredkin +G generates all parity-preserving transformations, which equals
hFredkin, NOTNOTi by Proposition 30. If k (S) = 1, then Theorem 41 implies that Fredkin +G
generates all transformations, which equals hToffolii by Theorem 23.
By Theorem 12, the one remaining case is that k (S) = 2 and some G ∈ S is parity-flipping.
By Theorem 41, certainly Fredkin +G at least generates all parity-preserving transformations.
Furthermore, let F be any parity-flipping transformation. Then F ⊗ G−1 is parity-preserving. So
we can use Fredkin +G to implement F ⊗ G−1 , then compose with G itself to get F . Therefore we
generate all parity-flipping transformations, which equals hFredkin, NOTi by Theorem 31.
Lemma 43 ([19, 29]) Every nontrivial reversible gate G generates NOT with garbage.
Proposition 44 (folklore) For all n ≥ 3, every non-affine Boolean function on n bits has a
non-affine subfunction on n − 1 bits.
Proof. Let f : {0, 1}n → {0, 1} be non-affine, and let f0 and f1 be the (n − 1)-bit subfunctions
obtained by restricting f ’s first input bit to 0 or 1 respectively. If either f0 or f1 is itself non-affine,
then we are done. Otherwise, we have f0 (x) = (a0 · x) ⊕ b0 and f1 (x) = (a1 · x) ⊕ b1 , for some
a0 , a1 ∈ {0, 1}n−1 and b0 , b1 ∈ {0, 1}. Notice that f is non-affine if and only if a0 6= a1 . So there
is some bit where a0 and a1 are unequal. If we now remove any of the other rightmost n − 1 input
bits (which must exist since n − 1 ≥ 2) from f , then we are left with a non-affine function on n − 1
bits.
Lemma 45 ([19, 29]) Every non-affine reversible gate G generates the 2-bit AND gate with garbage.
17
Prompted by the present work, Lloyd has recently posted his 1992 report to the arXiv.
37
Proof. Certainly every non-affine gate is nontrivial, so we know from Lemma 43 that G generates
NOT with garbage. For this reason, it suffices to show that G can generate some non-affine 2-bit
gate with garbage (since all such gates are equivalent to AND under negating inputs and outputs).
Let G (x1 . . . xn ) = y1 . . . yn , and let yi = fi (x1 . . . xn ). Then some particular fi must be a non-
affine Boolean function. So it suffices to show that, by restricting n − 2 of fi ’s input bits, we can
get a non-affine function on 2 bits. But this follows by inductively applying Proposition 44.
By using Lemma 45, it is possible to prove directly that the only classes that contain a CNOT
gate are hCNOTi (i.e., all affine transformations) and hToffolii (i.e., all transformations)—or in
other words, that if G is any non-affine gate, then hCNOT, Gi = hToffolii. However, we will skip
this result, since it is subsumed by our later results.
Recall that COPY is the 2-bit partial gate that maps 00 to 00 and 10 to 11.
Lemma 46 ([19, 13]) Every non-degenerate reversible gate G generates COPY with garbage.
Proof. Certainly every non-degenerate gate is nontrivial, so we know from Lemma 43 that G
generates NOT with garbage. So it suffices to show that there is some pair of inputs x, x′ ∈ {0, 1}n ,
which differ only at a single coordinate i, such that G (x) and G (x′ ) have Hamming distance at
least 2. For then if we set xi := z, and regard the remaining n − 1 coordinates of x as ancillas,
we will find at least two copies of z or z in G (x), which we can convert to at least two copies of z
using NOT gates. Also, if all of the ancilla bits that receive a copy of z were initially 1, then we
can use a NOT gate to reduce to the case where one of them was initially 0.
Thus, suppose by contradiction that G (x) and G (x′ ) are neighbors on the Hamming cube
whenever x and x′ are neighbors. Then starting from 0n and G (0n ), we find that every G (ei )
must be a neighbor of G (0n ), every G (ei ⊕ ej ) must be a neighbor of G (ei ) and G (ej ), and so on,
so that G is just a rotation and reflection of {0, 1}n . But that means G is degenerate, contradiction.
The proof will be slightly more complicated than necessary, but we will then reuse parts of it
in Section 8.4, when we show that every non-affine, non-conservative gate generates Fredkin.
Given a gate Q, let us call Q strong quasi-Fredkin if there exist control strings a, b, c, d such
that
Lemma 48 Let G be any nontrivial n-bit conservative gate. Then G generates a strong quasi-
Fredkin gate.
38
Proof. By conservativity, G maps unit vectors to unit vectors, say G (ei ) = eπ(i) for some per-
mutation π. But since G is nontrivial, there is some input x ∈ {0, 1}n such that xi = 1, but the
corresponding bit π (i) in G (x) is 0. By conservativity, there must also be some bit j such that
xj = 0, but bit π (j) of G (x) is 1. Now permute the inputs to make bit j and bit i the last two
bits, permute the outputs to make bits π (j) and π (i) the last two bits, and permute either inputs
or outputs to make x match G (x) on the first n − 2 bits. After these permutations are performed,
x has the form w01 for some w ∈ {0, 1}n−2 . So
G 0n−2 , 01 = 0n−2 , 01 ,
G 1n−2 , 11 = 11n−2 , 11 ,
where the last two lines again follow from conservativity. Hence G (after these permutations)
satisfies the definition of a strong quasi-Fredkin gate.
Next, call a gate C a catalyzer if, for every x ∈ {0, 1}2n with Hamming weight n, there exists a
“program string” p (x) such that
C (p (x) , 0n 1n ) = (p (x) , x) .
In other words, a catalyzer can be used to transform 0n 1n into any target string x of Hamming
weight n. Here x can be encoded in any manner of our choice into the auxiliary program string
p (x), as long as p (x) is left unchanged by the transformation. The catalyzer itself cannot depend
on x.
Lemma 49 Let Q be a strong quasi-Fredkin gate. Then Q generates a catalyzer.
Proof. Let z := 0n 1n be the string that we wish to transform. For all i ∈ {1, . . . , n} and
j ∈ {n + 1, . . . , 2n}, let sij denote the operation that swaps the ith and j th bit of z. Then consider
the following list of “candidate swaps”:
s1,n+1 , . . . , s1,2n , s2,n+1 , . . . , s2,2n , . . . , sn,n+1 , . . . , sn,2n .
Suppose we go through the list in order from left to right, and for each swap in the list, get to
choose whether to apply it or not. It is not hard to see that, by making these choices, we can map
0n 1n to any x such that |x| = n, by pairing off the first 0 bit that should be 1 with the first 1 bit
that should be 0, the second 0 bit that should be 1 with the second 1 bit that should be 0, and so
on, and choosing to swap those pairs of bits and not any other pairs.
Now, let the program string p (x) be divided into n2 registers r1 , . . . , rn2 , each of the same size.
Suppose that, rather than applying (or not applying) the tth swap sij in the list, we instead apply
the gate F , with rt as the control string, and zi and zj as the target bits. Then we claim that we
can map z to x as well. If the tth candidate swap is supposed to occur, then we set rt := b. If
the tth candidate swap is not supposed to occur, then we set rt to either a, c, or d, depending on
whether zi zj equals 01, 00, or 11 at step t of the swapping process. Note that, because we know x
when designing p (x), we know exactly what zi zj is going to be at each time step. Also, zi zj will
never equal 10, because of the order in which we perform the swaps: we swap each 0 bit zi that
needs to be swapped with the first 1 bit zj that we can. After we have performed the swap, zi = 1
will then only be compared against other 1 bits, never against 0 bits.
Finally:
39
Lemma 50 Let G be any non-affine gate, and let C be any catalyzer. Then G + C generates
Fredkin.
Proof. We will actually show how to generate any conservative transformation F : {0, 1}n →
{0, 1}n .
Since G is non-affine, Lemmas 43, 45, and 46 together imply that we can use G to compute any
Boolean function, albeit possibly with input-dependent garbage.
Let x ∈ {0, 1}n . Then by assumption, C maps 0n 1n to F (x) F (x) using the program string
p(F (x) F (x)). Now, starting with x and ancillas 0n 1n , we can clearly use G to produce
Notice that since F is conservative, we have x, F (x) = n. Therefore, there exists some program
string p(x, F (x)) that can be used as input to C −1 to map x, F (x) to 0n 1n . Again, we can generate
this program string using the fact that G is non-affine:
F (x) , 0n 1n
Thus, let G be a non-affine, non-conservative gate. Starting from G, we will perform a sequence
of transformations to produce gates that are “gradually closer” to Fredkin. Some of these trans-
formations might look a bit mysterious, but they will culminate in a strong quasi-Fredkin gate,
which we already know from Lemmas 49 and 50 is enough to generate a Fredkin gate (since G is
also non-affine).
The first step is to create a non-affine gate with two particular inputs as fixed points.
40
Lemma 52 Let G be any non-affine gate on n bits. Then G generates a non-affine gate H on n2
2 2
bits that acts as the identity on the inputs 0n and 1n .
1. Apply G⊗n to n2 input bits. Let Gi be the ith gate in this tensor product.
2. For all i ∈ [n − 1], swap the ith output bit of Gi with the ith output bit of Gn .
⊗n
3. Apply G−1 .
2 2 2 2
It is easy to see that H maps 0n to 0n and 1n to 1n . (Indeed, H maps every input that
consists of an n-bit string repeated n times to itself.) To see that H is also non-affine, first notice
that G−1 is non-affine. But we can cause any input x = x1 . . . xn that we like to be fed into the
final copy of G−1 , by encoding that input “diagonally,” with each Gi producing xi as its ith output
bit. Therefore H is non-affine.
As a remark, with all the later transformations we perform, we will want to maintain the
property that the all-0 and all-1 inputs are fixed points. Fortunately, this will not be hard to
arrange.
Let H be the output of Lemma 52. If H is conservative (i.e., k (H) = ∞), then H already
generates Fredkin by Theorem 47, so we are done. Thus, we will assume in what follows that k (H)
is finite. We will further assume that H is mod-k (H)-preserving. By Theorem 12, the only gates
H that are not mod-k (H)-preserving are the parity-flipping gates—but if H is parity-flipping, then
H ⊗ H is parity-preserving, and we can simply repeat the whole construction with H ⊗ H in place
of H.
Now we want to show that we can use H to decrease the inner product between a pair of inputs
by exactly 1 mod m, for any m we like.
Lemma 53 Let H be any non-conservative, nonlinear gate. Then for all m ≥ 2, there is a positive
integer t, and inputs x, y, such that
Proof. Let m = pα1 1 pα2 2 . . . pαs s where each pi is a distinct prime. By Corollary 15, we know that
for each pi , there is some pair of inputs xi , yi such that
41
Here di represents the number of times the pair (xi , yi ) occurs in (x, y). By construction, no pi
divides γi , and since the pi ’s are distinct primes, they have no common factor. This implies that
gcd (γ1 , . . . , γs , m) = 1. So by the Chinese Remainder Theorem, a solution to (7) exists.
Note also that, if H maps the all-0 and all-1 strings to themselves, then H ⊗t does so as well.
To proceed further, it will be helpful to introduce some terminology. Suppose that we have
two strings x = x1 . . . xn and y = y1 . . . yn . For each i, the pair xi yi has one of four possible values:
00, 01, 10, or 11. Let the type of (x, y) be an ordered triple (a, b, c) ∈ Z3 , which simply records
the number of occurrences in (x, y) of each of the three pairs 01, 10, and 11. (It will be convenient
not to keep track of 00 pairs, since they don’t contribute to the Hamming weight of either x or y.)
Clearly, by applying swaps, we can convert between any pairs (x, y) and (x′ , y ′ ) of the same type,
provided that x, y, x′ , y ′ all have the same length n.
Now suppose that, by repeatedly applying a gate H, we can convert some input pair (x, y) of
type (a, b, c) into some pair (x′ , y ′ ) of type (a′ , b′ , c′ ). Then we say that H generates the slope
a′ − a, b′ − b, c′ − c .
Note that, if H generates the slope (p, q, r), then by inverting the transformation, we can also
generate the slope (−p, −q, −r). Also, if H generates the slope (p, q, r) by acting on the input pair
(x, y), and the slope (p′ , q ′ , r ′ ) by acting on (x′ , y ′ ), then it generates the slope (p + p′ , q + q ′ , r + r ′ )
by acting on (xx′ , yy ′ ). For these reasons, the achievable slopes form a 3-dimensional lattice—that
is, a subset of Z3 closed under integer linear combinations—which we can denote L (H).
What we really want is for the lattice L (H) to contain a particular point: (1, 1, −1). Once we
have shown this, we will be well on our way to generating a strong quasi-Fredkin gate. We first
need a general fact about slopes.
Lemma 54 Let H map the all-0 input to itself. Then L (H) contains the points (k (H) , 0, 0),
(0, k (H) , 0), and (0, 0, k (H)).
Proof. Recall from Proposition 40 that there exists a t, and an input w, such that H ⊗t (w) =
|w| + k (H). Thus, to generate the slope (k (H) , 0, 0), we simply need to do the following:
• Choose an input pair (x, y) with sufficiently many xi yi pairs of the forms 10 and 00.
• Apply H ⊗t to a subset of bits on which x equals w, and y equals the all-0 string.
Doing this will increase the number of 10 pairs by k (H), while not affecting the number of 01
or 11 pairs.
To generate the slope (0, k (H) , 0), we do exactly the same thing, except that we reverse the
roles of x and y.
Finally, to generate the slope (0, 0, k (H)), we choose an input pair (x, y) with sufficiently many
xi yi pairs of the forms 11 and 00, and then use the same procedure to increase the number of 11
pairs by k (H).
We can now prove that (1, 1, −1) is indeed in our lattice.
Lemma 55 Let H be a mod-k (H)-preserving gate that maps the all-0 input to itself, and suppose
there exist inputs x, y such that
H(x) · H(y) − x · y ≡ −1 (mod k (H)) .
Then (1, 1, −1) ∈ L (H).
42
Proof. The assumption implies directly that H generates a slope of the form (p, q, −1 + rk (H)), for
some integers p, q, r. Thus, Lemma 54 implies that H also generates a slope of the form (p, q, −1),
via some gate G ∈ hHi acting on inputs (x, y). Now, since H is mod-k (H)-preserving, we have
|G (x)| ≡ |x| (mod k (H)) and |G (y)| ≡ |y| (mod k (H)). But this implies that p ≡ 1 (mod k (H))
and q ≡ 1 (mod k (H)). So, again using Lemma 54, we can generate the slope (1, 1, −1).
Combining Lemmas 52, 53, and 55, we can summarize our progress so far as follows.
Corollary 56 Let G be any non-affine, non-conservative gate. Then either G generates Fredkin,
or else it generates a gate H that maps the all-0 and all-1 inputs to themselves, and that also
satisfies (1, 1, −1) ∈ L (H).
We now explain the importance of the lattice point (1, 1, −1). Given a gate Q, let us call Q
weak quasi-Fredkin if there exist strings a and b such that
Then:
Lemma 57 A gate H generates a weak quasi-Fredkin gate if and only if (1, 1, −1) ∈ L (H).
Proof. If H generates a weak quasi-Fredkin gate Q, then applying Q to the input pair (a, 01) and
(b, 01) directly generates the slope (1, 1, −1). For the converse direction, if H generates the slope
(1, 1, −1), then by definition there exists a gate Q ∈ hHi, and inputs x, y, such that |Q (x)| = |x|
and |Q (y)| = |y|, while
Q (x) · Q (y) = x · y − 1.
In other words, applying Q decreases by one the number of 1 bits on which x and y agree, while
leaving their Hamming weights the same. But in that case, by permuting input and output bits,
we can easily put Q into the form of a weak quasi-Fredkin gate.
Next, recall the definition of a strong quasi-Fredkin gate from Section 8.3. Then combining
Corollary 56 with Lemma 57, we have the following.
Corollary 58 Let G be any non-affine, non-conservative gate. Then either G generates Fredkin,
or else it generates a strong quasi-Fredkin gate.
Proof. Combining Corollary 56 with Lemma 57, we find that either G generates Fredkin, or else
it generates a weak quasi-Fredkin gate that maps the all-0 and all-1 strings to themselves. But
such a gate is a strong quasi-Fredkin gate, since we can let c be the all-0 string and d be the all-1
string.
Combining Corollary 58 with Lemmas 49 and 50 now completes the proof of Theorem 51:
that every non-affine, non-conservative gate generates Fredkin. However, since every non-affine,
conservative gate generates Fredkin by Theorem 47, we get the following even broader corollary.
Finally, combined with Corollary 42, Corollary 59 completes the proof of Theorem 39, that
every non-affine gate set generates either hFredkini, hFredkin, NOTNOTi, hFredkin, NOTi, hCk i
for some k ≥ 3, or hToffolii.
43
9 The Affine Part
Having completed the classification of the non-affine classes, in this section we turn our attention
to proving that there are no affine classes besides the ones listed in Theorem 3: namely, the trivial,
T6 , T4 , F4 , CNOTNOT, and CNOT classes, as well as various extensions of them by NOTNOT
and NOT gates.
To make the problem manageable, we start by restricting attention to the linear parts of affine
transformations (i.e., if a transformation has the form G (x) = Ax ⊕ b, we ignore the additive
constant b). We show that the only possibilities for the linear part are: the identity, all mod-4-
preserving orthogonal transformations, all orthogonal transformations, all parity-preserving linear
transformations, or all linear transformations. This result, in turn, is broken into several pieces:
• In Section 9.1, we show that any mod-4-preserving orthogonal gate generates all mod-4-
preserving orthogonal transformations, and that any non-mod-4-preserving orthogonal gate
generates all orthogonal transformations.
• In Section 9.2, we show that every non-orthogonal, parity-preserving linear gate generates
CNOTNOT. This again requires “slope theory” and the analysis of a 3-dimensional lattice.
It also draws on the results of Section 6.3, which tell us that it suffices to restrict attention
to the case k (G) = 2.
• In Section 9.3, we show that every non-parity-preserving linear gate generates CNOT. In
this case we are lucky that we only need to analyze a 1-dimensional lattice (i.e., an ideal in
Z)
Finally, in Section 9.4, we complete the classification by showing that including the affine
parts can yield only the following additional possibilities: NOTNOT, NOT, F4 , F4 + NOTNOT,
F4 + NOT, T6 + NOTNOT, T6 + NOT, or CNOTNOT + NOT. Summarizing, the results of this
section will imply the following.
Theorem 60 Any set of affine gates generates one of the following 13 classes: h∅i, hNOTNOTi,
hNOTi, hT6 i, hT6 , NOTNOTi, hT6 , NOTi, hT4 i, hF4 i, hT4 , NOTNOTi, hT4 , NOTi, hCNOTNOTi,
hCNOTNOT, NOTi, or hCNOTi.
Together with Theorem 39, this will then complete the proof of Theorem 3.
Proof. We first describe how to simulate T6 (x1 . . . x6 ), using three applications of T4k+2 . Let
bx := x1 ⊕ · · · ⊕ x6 . Also, let a be a string of ancilla bits, initialized to 02k−2 . Then:
44
2. Swap out 2k − 2 of the bx bits with the ancilla string a = 02k−2 , and apply T4k+2 again. This
yields
T4k+2 02k−2 , bx2k−2 , x1 ⊕ bx , . . . , x6 ⊕ bx = bx2k−2 , 02k−2 , x1 . . . x6 ,
3. Swap the 2k − 2 bits that are now 0 with a = bx2k−2 , and apply T4k+2 a third time. This
returns a to 02k−2 , and yields
T4k+2 bx4k−4 , x1 . . . x6 = T4k+2 04k−4 , x1 ⊕ bx , . . . x6 ⊕ bx .
Thus, we have successfully applied T6 to x1 . . . x6 . The same sequence of steps can be used to
simulate T4 (x1 . . . x4 ) using three applications of T4k .
We can now show that there is only one nontrivial orthogonal class that is also mod-4-preserving:
namely, hT6 i.
Proof. Let G (x) = Ax, for some A ∈ F2n×n . Then recall from Corollary 21 that A is orthogonal.
By Lemma 16, this implies that A−1 = AT , so G can also generate AT .
Let B be the (n + 1) × (n + 1) matrix that acts as the identity on the first bit, and as A on
bits 2 through n + 1. Observe that B T acts as the identity on the first bit, and as AT on bits
2 through n + 1. Also, since A preserves Hamming weight mod 4, so do AT , B, and B T . By
Corollary 22, this implies that each of B T ’s column vectors must have Hamming weight 1 mod
4. Furthermore, since A is nontrivial, there must be some column of B T with Hamming weight
4k + 1, for some k ≥ 1. Then by swapping rows and columns, we can get B T into the form
1 0 0 ··· 0
0 1 —v1 —
.. .. ..
. . .
0 1 —v4k+1 — ,
0 0 —v4k+2 —
.. .. ..
. . .
0 0 —vn —
where v1 , . . . , vn are row vectors each of length n − 1. Let δij equal 1 if i = j or 0 otherwise. Then
note that by orthogonality,
δij if i, j ≤ 4k + 1,
vi · vj =
δij otherwise.
Now let C T be the matrix obtained by swapping the first two columns of B T . Then we claim
that C T B yields a T4k+2 transformation. Since T4k+2 generates T6 by Lemma 61, we will be done
after we have shown this.
45
We have
0 1 0 ··· 0
1 0 —v1 —
1 0 ··· 0 0 ··· 0
.. .. ..
0 1 ··· 1 0 ··· 0
. . .
0 |
T | | |
C B= 1 0 —v4k+1 —
.
0 0 —v4k+2 — .. v1T T T · · · vnT
· · · v4k+1 v4k+2
.. .. ..
0 | | | |
. . .
0 0 —vn —
0 1 1 0 0 0
..
1 . 1 0 0 0
1 1 0 0 0 0
= .
0 0 0 1 0 0
.
0 0 0 .. 0
0
0 0 0 0 0 1
One can check that the above transformation is actually T4k+2 on the first 4k + 2 bits, and the
identity on the rest.
Likewise, there is only one orthogonal class that is not mod-4-preserving: namely, hT4 i.
Theorem 63 Let G be any nontrivial orthogonal gate that does not preserve Hamming weight mod
4. Then G generates T4 .
Proof. We use essentially the same construction as in Theorem 62. The only change is that
Corollary 22 now tells us that there must be a column of B T with Hamming weight 4k + 3 for
some k ≥ 1, so we use that in place of the column with Hamming weight 4k + 1. This leads to an
(n + 1) × (n + 1) matrix C T B, which acts as T4k+4 on the first 4k + 4 bits and as the identity on
the rest. But T4k+4 generates T4 by Lemma 61, so we are done.
The main idea of the proof is as follows. Let CPD, or Copying with a Parity Dumpster, be the
following partial reversible gate:
46
In other words, CPD maps x0y to x, x, x ⊕ y—copying x, but also XORing x into the y “dumpster”
in order to preserve the total parity. Notice that CPD is consistent with CNOTNOT; indeed, it is
simply the restriction of CNOTNOT to inputs whose second bit is 0. Notice also that, whenever
we have a 3-bit string of the form xxy, we can apply CPD in reverse to get x, 0, x ⊕ y.
Then we will first observe that CPD generates CNOTNOT. We will then apply the theory
of types and slopes, which already made an appearance in Section 8.4, to show that any non-
orthogonal linear gate generates CPD: in essence, that there are no modularity or other obstructions
to generating it.
Lemma 65 Let G be any gate that generates CPD. Then G generates CNOTNOT (or equiva-
lently, all parity-preserving linear transformations).
Proof. Let F : {0, 1}n → {0, 1}n be any reversible, parity-preserving linear transformation. Then
we can generate the following sequence of states:
for some garbage strings gar (x) and gar (F (x)). Here the first line computes F (x) from x; the
second line applies CPD to copy F (x) (using a single “dumpster” bit for each bit of F (x)); the
third line uncomputes F (x); the fourth line computes a second copy of x from F (x); the fifth line
applies CPD in reverse to erase one of the copies of x (reusing same dumpster bit from before); and
the sixth line uncomputes x. Also, |x| + |F (x)| ≡ 0 (mod 2) follows because F is parity-preserving.
So, given a non-orthogonal, parity-preserving linear gate G, we now need to show how to
implement CPD.
For the rest of this section, we will consider a situation where we are given an n-bit string, with
the initial state xy0n−2 (where x and y are two arbitrary bits), and then we apply a sequence of
F2 linear transformations to the string. Here we do not assume that ancilla bits initialized to 1
are available, though ancilla bits initialized to 0 are fine. As a result, at every time step, every bit
in our string will be either x, y, x ⊕ y, or 0. Because we are studying only the linear case here,
not the affine case, we do not need to worry about the possibilities x ⊕ 1, y ⊕ 1, etc., which would
considerably complicate matters. (We will handle the affine case in Section 9.4.)
By analogy to Section 8.4, let us define the type of a string z (x, y) ∈ {0, 1}n to be (a, b, c), if
z contains a copies of x and b copies of y and c copies of x ⊕ y. Since any string of type (a, b, c)
can be transformed into any other string of type (a, b, c) using bit-swaps, the type of z is its only
relevant property. As before, if by repeatedly applying a linear gate G, we can map some string
of type (a, b, c) into some string of type (a′ , b′ , c′ ), then we say that G generates the slope
a′ − a, b′ − b, c′ − c .
47
Again, if G generates the slope (p, q, r), then G−1 generates the slope (−p, −q, −r). Also, if G
generates the slope (p, q, r) using the string z, and the slope (p′ , q ′ , r ′ ) using the string z ′ , then
it generates the slope (p + p′ , q + q ′ , r + r ′ ) using the string zz ′ . For these reasons, the set of
achievable slopes forms a 3-dimensional lattice, which we denote L (G) ⊆ Z3 . Moreover, this is a
lattice with a strong symmetry property:
Proof. Clearly we can interchange the roles of x and y. However, we can also, e.g., define x′ := x
and y ′ := x ⊕ y, in which case x′ ⊕ y ′ = y. In the triple (x, y, x ⊕ y), each element is the XOR of
the other two.
Just like before, our entire question will boil down to whether or not the lattice L (G) contains
a certain point. In this case, the point is (1, −1, 1). The importance of the (1, −1, 1) point comes
from the following lemma.
Lemma 67 Let G be any linear gate. Then G generates CPD, if and only if (1, −1, 1) ∈ L (G).
Proof. If G generates CPD, then it maps x0y, which has type (1, 1, 0), to x, x, x ⊕ y, which has
type (2, 0, 1). This amounts to generating the slope (1, −1, 1).
Conversely, suppose (1, −1, 1) ∈ L (G). Then there is some gate H ∈ hGi, and some string of
the form z = xa y b (x ⊕ y)c , such that
But the very fact that G generates such an H implies that G is non-degenerate, and if G is non-
degenerate, then Lemma 46 implies that, starting from xy0n−2 , we can use G to increase the
numbers of x, y, and x ⊕ y simultaneously without bound. That is, there is some Q ∈ hGi such
that (omitting the 0 bits)
′ ′ ′
Q (xy) = xa y b (x ⊕ y)c ,
where a′ > a and b′ > b and c′ > c. So then the procedure to implement CPD is to apply Q, then
H, then Q−1 .
Thus, our goal now is to show that, if G is any non-orthogonal, parity-preserving linear gate,
then (1, −1, 1) ∈ L (G). Observe that, if k (G) = 4, then Corollary 21 implies that G is orthogonal,
contrary to assumption. By Theorem 19, this means that the only remaining possibility is k (G) =
2. This has the following consequence for the lattice L (G).
Proposition 68 If G is a linear gate with k (G) ≤ 2, then L (G) contains all even points (i.e., all
(p, q, r) such that p ≡ q ≡ r ≡ 0 (mod 2)).
Proof. By Proposition 40, we must be able to use G to map 10n−1 to 1110n−3 . Since 0n is mapped
to itself by any linear transformation, this implies that G can map x0n−1 to xxx0n−3 , which means
that it generates the slope (2, 0, 0). So (2, 0, 0) ∈ L (G). By Proposition 66, then, L (G) also
contains the points (0, 2, 0) and (0, 0, 2). But these three generate all the even points.
Proposition 68 has the following immediate corollary.
Corollary 69 Let G be a linear gate with k (G) ≤ 2, and suppose L (G) contains any point (p, q, r)
such that p ≡ q ≡ r ≡ 1 (mod 2). Then L (G) contains (1, −1, 1).
48
Thus, it remains only to prove the following lemma.
Lemma 70 Let G be any parity-preserving, non-orthogonal linear gate. Then L (G) contains a
point (p, q, r) such that p ≡ q ≡ r ≡ 1 (mod 2).
Proof. In the proof of Theorem 64, this is the first place where we use the linearity of G in an
essential way—i.e., not just to deduce that k (G) ∈ {2, 4}, or to avoid dealing with bits of the form
x ⊕ 1, y ⊕ 1, etc. It is also the first place where we use the non-orthogonality of G, other than to
rule out the possibility that k (G) = 4; and the first place where we use that G is parity-preserving.
Let us view G as an n × n matrix over F2 . Then the fact that G is parity-preserving means
that every column of G has odd Hamming weight. Also, the fact that G is non-orthogonal means
that it must have two columns with an odd inner product. Assume without loss of generality that
these are the first and second columns. Let the first two columns of G consist of:
where a, b, c, d are nonnegative integers summing to n. Then from the above, we have that a + b
and b + c and b are all odd, from which it follows that a and c are even.
Now consider applying G to the input xy0n−2 . The result will contain:
a copies of x,
c copies of y,
b copies of x ⊕ y.
This means that we’ve mapped a string of type (1, 1, 0) to a string of type (a, c, b), thereby generating
the slope (a − 1, c − 1, b). But this is the desired odd point in L (G).
Combining Lemma 65, Lemma 67, Corollary 69, and Lemma 70 now completes the proof of
Theorem 64.
Theorem 71 Let G be any non-parity-preserving linear gate. Then G generates CNOT (or equiv-
alently, all linear transformations).
Recall that COPY is the partial gate that maps x0 to xx. We will first show how to use G to
generate COPY, and then use COPY to generate CNOT.
Note that since G is linear, it cannot be parity-flipping. So since G is non-parity-preserving, it
is also non-parity-respecting, and k (G) must be finite and odd. But by Theorem 19, this means
that k (G) = 1: in other words, G is non-mod-respecting.
Let z be an n-bit string that consists entirely of copies of x and 0. Let the type of z be the
number of copies of x. Clearly we can map any z to any other z of the same type using swaps, so
49
the type of z is its only relevant property. Also, we say that a gate G generates the slope p, if by
applying G repeatedly, we can map some input z of type a to some input z ′ of type a+p. Note that
if G generates the slope p, then by reversibility, it also generates the slope −p. Also, if G generates
the slope p by mapping z to z ′ , and the slope q by mapping w to w′ , then it generates the slope p+q
by mapping zw to z ′ w′ . For these reasons, the set of achievable slopes forms an ideal in Z (i.e., a
1-dimensional lattice), which we can denote L (G). The question of whether G generates COPY
can then be rephrased as the question of whether L (G) contains 1—or equivalently, of whether
L (G) = Z.
Proof. If G generates COPY, then clearly 1 ∈ L (G). For the converse direction, suppose
1 ∈ L (G). Then G can be used to map an input of type a to an input of type a + 1, for some a.
Hence G can also be used to map inputs of type b to inputs of type b + 1, for all b ≥ a. This also
implies that G is non-degenerate, so by Lemma 46, it can be used to increase the number of copies
of x without bound. So to copy a bit x, we first apply some gate H ∈ hGi to map x to xb for some
b ≥ a, then map xb to xb+1 , and finally apply H −1 to map xb+1 to x2 .
Now, the question of whether 1 ∈ L (G) is easily answered.
Proof. This follows almost immediately from Proposition 40, together with the fact that k (G) = 1.
We simply need to observe that, if x = 1, then the number of copies of x corresponds to the
Hamming weight.
Finally, we show that COPY suffices for CNOT.
Lemma 74 Let G be any linear gate that generates COPY. Then G generates CNOT.
Proof. We will actually prove that G generates any linear transformation F . Observe that, if G
generates COPY, then it must be non-degenerate. Therefore, by copying bits whenever needed,
and using G to do computation on them, clearly we can map the input x to a string of the form
for some garbage string gar (x). Since G generates COPY, we can then make one copy of F (x),
mapping the above to
x, gar (x) , F (x) , F (x) .
Next we can uncompute the computation of F to get
x, F (x) .
50
Finally, we can uncompute the computation of x to get F (x) alone.
Combining Lemmas 72, 73, and 74 now completes the proof of Theorem 71. Then combining
Theorems 32, 33, 36, 37, 62, 63, 64, and 71, we can summarize our progress on the linear case as
follows.
Corollary 75 Every set of linear gates generates either h∅i, hT6 i, hT4 i, hCNOTNOTi, or hCNOTi.
Proof. To implement NOTNOT (x, y), apply NOT⊗k to x, a1 . . . ak−1 and then to y, a1 . . . ak−1 . To
⊗k
implement NOT (x), let ℓ := k−1 2 . Apply NOT to x, a1 . . . aℓ , b1 . . . bℓ , then x, a1 . . . aℓ , c1 . . . cℓ ,
then x, b1 . . . bℓ , c1 . . . cℓ .
More generally:
Lemma 77 Let G be any gate of the form NOT ⊗H. Then G generates NOTNOT.
Proof. To implement NOTNOT (x, y), first apply G to x, a where a is some ancilla string; then
apply G−1 to y, a.
Also:
Proof. First we use G⊗2 to map x, 0n to Ax ⊕ b, b; then we use NOTNOT gates to map Ax ⊕ b, b
to Ax, 0n .
By combining Lemmas 77 and 78, we obtain the following.
Corollary 79 (Cruft Removal) Let G (x) = Ax ⊕ b be an n-bit affine gate. Suppose A applies
a linear transformation A′ to the first m bits of x, and acts as the identity on the remaining n − m
bits. Then G generates an m-bit gate of the form H (x) = A′ x ⊕ c.
Proof. If bi = 0 for all i > m, then we are done. Otherwise, we can use Lemma 77 to generate
NOTNOT, and then Lemma 78 to generate H (x) = A′ x.
Lemma 80 Let S be any class of parity-preserving linear or affine gates. Then there are no
classes between hSi and hS + NOTi other than hS + NOTNOTi.
Proof. Let G be a transformation that is generated by S + NOT but not by S. Then we need to
show how to generate NOT or NOTNOT themselves using S + G.
We claim that G acts as
G (x) = V (x) ⊕ b,
where V (x) is some parity-preserving affine transformation generated by S, and b is some nonzero
string. First, V must be generated by S because, given any circuit for G over the set S + NOT,
51
we can always push the NOT gates to the end; this leaves us with a circuit for the “S part” of G.
(This is the one place where we use that S is affine.) Also, b must be nonzero because otherwise,
G would already be generated by S.
Given x, suppose we first apply V −1 (which must be generated by S), then apply G. This
yields
G V −1 (x) = V V −1 (x) ⊕ b = x ⊕ b,
which is equivalent to NOT⊗k for some nonzero k. By Lemma 76, this generates NOTNOT. If
|b| is always even, then since V is parity-preserving, clearly we remain within hS + NOTNOTi. If,
on the other hand, |b| is ever odd, then again by Lemma 76, we can generate NOT.
We can finally complete the proof of Theorem 60, characterizing the possible affine classes.
Proof of Theorem 60. If we restrict ourselves to the linear part of the class, then we know from
Corollary 75 that the only possibilities are hCNOTi, hCNOTNOTi, hT4 i, hT6 i, and h∅i (i.e., the
trivial class). We will handle these possibilities one by one.
Linear part is hCNOTi. Since CNOT can already generate all affine transformations (by
Theorem 32), using an ancilla bit initialized to 1, we have hSi ⊆ hCNOTi. For the other direction,
Corollary 79 implies that S must generate a gate of the form CNOT (x) ⊕ b, for some b ∈ {0, 1}2 .
However, it is not hard to see that all such gates can generate CNOT itself.
Linear part is hCNOTNOTi. Here we clearly have hSi ⊆ hCNOTNOT, NOTi. Meanwhile,
Corollary 79 again implies that S generates a gate of the form G (x) = CNOTNOT (x) ⊕ b, for some
b ∈ {0, 1}3 . Suppose the first bit of b is 1; this is the bit that corresponds to the control of the
CNOTNOT. Then G (G (x)) generates NOTNOT, so by Lemma 78, we can generate CNOTNOT.
If, on the other hand, the first bit of b is 0, then G generates NOT or NOTNOT directly, so we can
again use Lemma 78 to generate CNOTNOT. Therefore hSi lies somewhere between hCNOTNOTi
and hCNOTNOT, NOTi. But since CNOTNOT already generates NOTNOT, Lemma 80 says that
the only possibilities are hCNOTNOTi and hCNOTNOT, NOTi.
Linear part is hT4 i. In this case hSi ⊆ hT4 , NOTi. Again Corollary 79 implies that S
generates a gate of the form G (x) = T4 (x) ⊕ b, for some b ∈ {0, 1}4 . If b = 1111, then S generates
F4 . So hSi lies somewhere between hF4 i and hF4 , NOTi = hT4 , NOTi, but then Lemma 80 ensures
that hF4 i, hF4 , NOTNOTi = hT4 , NOTNOTi, and hT4 , NOTi are the only possibilities. Likewise,
if b = 0000, then S generates T4 , so hT4 i, hT4 , NOTNOTi, and hT4 , NOTi are the only possibilities.
Next suppose |b| is odd. Then G (G (x)) = NOT⊗4 (x), which generates NOTNOT by Lemma
76. So by Lemma 78, we generate T4 as well. Thus we have at least hT4 , NOTNOTi. But since
G itself is parity-flipping, hSi is not parity-preserving, leaving hT4 , NOTi as the only possibility by
Lemma 80. Finally suppose |b| = 2: without loss of generality, b = 1100. Let Q be an operation
that swaps the first two bits of x with the last two bits. Then G (Q (G (x))) is equivalent to
NOT⊗4 (x) up to swaps, so again we have at least hT4 , NOTNOTi, leaving hT4 , NOTNOTi and
hT4 , NOTi as the only possibilities.
Linear part is hT6 i. In this case hSi ⊆ hT6 , NOTi. Again, Corollary 79 implies that
S generates G(x) = T6 (x) ⊕ b for some b ∈ {0, 1}6 . If b = 000000, then S generates T6 , so
hT6 i, hT6 , NOTNOTi, and hT6 , NOTi are the only possibilities by Lemma 80. If |b| is odd, then
G (G (x)) = NOT⊗6 (x). By Lemma 76, this means that S generates NOTNOT, so by Lemma 78, it
generates T6 as well. But G is parity-flipping, leaving hT6 , NOTi as the only possibility by Lemma
80. If |b| is 2 or 4, then by an appropriate choice of swap operation Q, we can cause G (Q (G (x)))
to generate NOTNOT, so again hT6 , NOTNOTi and hT6 , NOTi are the only possibilities.
52
Finally, if b = 111111, then G(x) = F6 (x). In this case we start with the operation
F6 (x00000) = 1xxxxx
Using three of the x outputs and three fresh 0 ancilla bits, we then perform
F6 (xxx000) = 111xxx
Next, bringing the xxx outputs together with the remaining xx outputs and one fresh 0 ancilla bit,
we apply
F6 (xxxxx0) = 11100x
In summary, we have performed a NOT (x) operation with some garbage still around. However,
if we repeat this entire procedure 6 times, then the Hamming weight of the garbage will be a
multiple of 6. We can remove all this of garbage using the F6 gate. Therefore, we have created
a NOT⊗6 gate, which generates NOTNOT by Lemma 76. So again we can generate T6 , leaving
hT6 , NOTNOTi and hT6 , NOTi as the only possibilities by Lemma 78.
Linear part is h∅i. In this case hSi ⊆ hNOTi, so Lemma 80 implies that the only possibilities
are h∅i, hNOTNOTi, and hNOTi.
10 Open Problems
As discussed in Section 1, the central challenge we leave is to give a complete classification of all
quantum gate sets acting on qubits, in terms of which unitary transformations they can generate
or approximate. Here, just like in this paper, one should assume that qubit-swaps are free, and
that arbitrary ancillas are allowed as long as they are returned to their initial states.
A possible first step—which would build directly on our results here—would be to classify all
possible quantum gate sets within the stabilizer group, which is a quantum generalization of the
group of affine classical reversible transformations. Since the stabilizer group is discrete, here
at least there is no need for representation theory, Lie algebras, or any notion of approximation,
but the problem still seems complicated. A different step in the direction we want, which would
involve Lie algebras, would be to classify all sets of 1- and 2-qubit gates. A third step would be to
classify qubit Hamiltonians (i.e., the infinitesimal-time versions of unitary gates), in terms of which
n-qubit Hamiltonians they can be used to generate. Here the recent work of Cubitt and Montanaro
[9], which classifies qubit Hamiltonians in terms of the complexity of approximating ground state
energies, might be relevant. Yet a fourth possibility would be to classify quantum gates under the
assumption that intermediate measurements are allowed. Of course, these simplifications can also
be combined.
On the classical side, we have left completely open the problem of classifying reversible gate sets
over non-binary alphabets. In the non-reversible setting, it was discovered in the 1950s (see [18])
that Post’s lattice becomes dramatically different and more complicated when we consider gates
over a 3-element set rather than Boolean gates: for example, there is now an uncountable infinity of
clones, rather than “merely” a countable infinity. Does anything similar happen in the reversible
case? Even for reversible gates over (say) {0, 1, 2}n , we cannot currently give an algorithm to
decide whether a given gate G generates another gate H any better than the triple-exponential-
time algorithm that comes from clone theory, nor can we give reasonable upper bounds on the
53
number of gates or ancillas needed in the generating circuit, nor can we answer basic questions like
whether every class is finitely generated.
Finally, can one reduce the number of gates in each of our circuit constructions to the limits
imposed by Shannon-style counting arguments? What are the tradeoffs, if any, between the number
of gates and the number of ancilla bits?
11 Acknowledgments
At the very beginning of this project, Emil Jeřábek [12] brought the hCk i and hT6 i classes to our
attention, and also proved that every reversible gate class is characterized by invariants (i.e., that
the “clone-coclone duality” holds for reversible gates). Also, Matthew Cook gave us encouragement,
asked pertinent questions, and helped us understand the hT4 i class. We are grateful to both of
them. We also thank Adam Bouland, Seth Lloyd, Igor Markov, and particularly Siyao Xu for
helpful discussions.
References
[1] S. Aaronson and A. Arkhipov. The computational complexity of linear optics. Theory of
Computing, 9(4):143–252, 2013. Conference version in Proceedings of ACM STOC’2011. ECCC
TR10-170, arXiv:1011.3245.
[2] S. Aaronson and A. Bouland. Generation of universal linear optics by any beam splitter. Phys.
Rev. A, 89(6):062316, 2014. arXiv:1310.6718.
[3] S. Aaronson and D. Gottesman. Improved simulation of stabilizer circuits. Phys. Rev. A,
70(052328), 2004. arXiv:quant-ph/0406196.
[6] M. Ben-Or and R. Cleve. Computing algebraic formulas with a constant number of registers.
In Proc. ACM STOC, pages 254–257, 1988.
[7] C. H. Bennett. Logical reversibility of computation. IBM Journal of Research and Develop-
ment, 17:525–532, 1973.
[8] R. Cleve and J. Watrous. Fast parallel circuits for the quantum Fourier transform. In Proc.
IEEE FOCS, pages 526–536, 2000. arXiv:quant-ph/0006004.
54
[10] E. Fredkin and T. Toffoli. Conservative logic. International Journal of Theoretical Physics,
21(3-4):219–253, 1982.
[11] D. Gottesman. Class of quantum error-correcting codes saturating the quantum Hamming
bound. Phys. Rev. A, 54:1862–1868, 1996. arXiv:quant-ph/9604038.
[15] E. Knill and R. Laflamme. Power of one bit of quantum information. Phys. Rev. Lett.,
81(25):5672–5675, 1998. arXiv:quant-ph/9802037.
[16] D. E. Knuth. The Art of Computer Programming, Volume 1, 2nd edition. Addison-Wesley,
1969.
[17] R. Landauer. Irreversibility and heat generation in the computing process. IBM Journal of
Research and Development, 5(3):183–191, 1961.
[18] D. Lau. Function Algebras on Finite Sets: Basic Course on Many-Valued Logic and Clone
Theory. Springer, 2006.
[19] S. Lloyd. Any nonlinear one-to-one binary logic gate suffices for computation. Technical Report
LA-UR-92-996, Los Alamos National Laboratory, 1992. arXiv:1504.03376.
[20] J. MacWilliams. Orthogonal matrices over finite fields. American Mathematical Monthly,
76(2):152–164, 1969.
[21] K. Morita, T. Ogiro, K. Tanaka, and H. Kato. Classification and universality of reversible
logic elements with one-bit memory. In Proceedings of the 4th International Conference on
Machines, Computations, and Universality, pages 245–256. Springer-Verlag, 2005.
[22] E. L. Post. The two-valued iterative systems of mathematical logic. Number 5 in Annals of
Mathematics Studies. Princeton University Press, 1941.
[23] M. Saeedi and I. L. Markov. Synthesis and optimization of reversible circuits–a survey. ACM
Computing Surveys, 45(2):21, 2013. arXiv:1110.2574.
55
⊤
XOR MONO
NOT AND OR
[26] Y. Shi. Both Toffoli and controlled-NOT need little help to do universal quantum computation.
Quantum Information and Computation, 3(1):84–92, 2002. quant-ph/0205115.
[27] I. Strazdins. Universal affine classification of Boolean functions. Acta Applicandae Mathemat-
ica, 46(2):147–167, 1997.
[28] T. Toffoli. Reversible computing. In Proc. Intl. Colloquium on Automata, Languages, and
Programming (ICALP), pages 632–644. Springer, 1980.
[29] A. De Vos and L. Storme. r-universal reversible logic gates. Journal of Physics A: Mathematical
and General, 37(22):5815–5824, 2004.
Theorem 81 (Post’s Lattice Lite) Assume the constant functions f = 0 and f = 1, as well as
the identity function f (x) = x, are available for free. Then the only Boolean clones (i.e., classes of
Boolean functions f : {0, 1}n → {0, 1} closed under composition and addition of dummy variables)
are the following:
56
3. The OR class.
Proof. We take it as known that {AND, OR} generates all monotone functions, {XOR, NOT}
generates all affine functions, and {G, NOT} generates all functions, for any 2-bit non-affine gate
G.
Let C be a Boolean clone that contains the constant 0 and 1 functions. Then C is closed under
restrictions (e.g., if f (x, y) ∈ C, then f (0, y) and f (x, 1) are also in C), and that is the crucial fact
we exploit.
First suppose C contains a non-monotone gate. Then certainly we can construct a NOT gate
by restricting inputs. If, in addition, C contains a non-affine gate, then by Proposition 44, we
can construct a 2-bit non-affine gate by restricting inputs: AND, OR, NAND, NOR, IMPLIES, or
NOT (IMPLIES). Together with the NOT gate, this puts us in class 7. If, on the other hand, C
contains only affine gates, then as long as one of those gates depends on at least two input bits,
by restricting inputs we can construct a 2-bit non-degenerate affine gate: XOR or NOT (XOR).
Together with the NOT gate, this puts us in class 6. If, on the other hand, every gate depends on
only 1 input bit, then we are in class 5.
Next suppose C contains only monotone gates. Clearly the only affine monotone gates are
trivial. Thus, as long as one of the gates is nontrivial, it is non-affine, so Proposition 44 again
implies that we can construct a non-affine 2-bit monotone gate by restricting inputs: AND or OR.
If we can construct only AND gates, then we are in class 2; if only OR gates, then we are in class
3; if both, then we are in class 4. If, on the other hand, every gate is trivial, then we are in class
1.
The simplicity of Theorem 81 underscores how much more complicated it is to understand
reversible gates than non-reversible gates, when we impose a similar rule in both cases (i.e., that 0
and 1 constant or ancilla bits are available for free).
Proof. That this collapse happens is clear: under the loose ancilla rule, we can always simulate
a NOT gate by applying a NOTNOT gate to the desired bit, as well as to a “dummy” ancilla bit
that will never be used for any other purpose.
To see that no other collapses happen, we must show that the remaining classes are distinct.
Under the usual ancilla rule, the classes are distinct because for any pair of classes we can find an
invariant satisfied by one, but not the other, to separate the two. We would like to do the same
for loose ancilla classes, but invariants under the usual rule need not, a priori, be invariants under
the loose ancilla rule. More concretely, as we have seen, a gate set that preserves parity under the
57
usual rule need no longer preserve it under the loose ancilla rule. However, we claim that all the
other invariants are also loose ancilla invariants.
Suppose G (x, a) = (H (x) , b) is a transformation generated under the loose ancilla rule, where
a and b are constants, so that under the loose ancilla rule, we have also generated H. We would
like to show that any invariant of G must also hold for H, so let us consider the invariants one by
one.
• If G is mod-k-respecting then
is constant modulo k, and hence |H (x)| − |x| is constant modulo k, so H (x) is mod-k-
respecting. For k ≥ 3, mod-k-respecting is equivalent to mod-k-preserving by Theorem 12.
When k = 2, we have already seen that NOTNOT collapses with NOT.
• If G is conservative then 0 = |G (x)| − |x| = |H (x)| − |x| + |a| − |b| as above. If we average
over all x and appeal to reversibility, then we see that |a| − |b| must be 0, and hence H is
conservative.
• If G is affine then
x M11 M12 x c1 H(x)
G = + = ,
a M21 M22 a c2 b
so clearly H (x) = M11 x + M12 a + c1 is affine as well. Since M21 x + M22 a + c2 = b for all x,
we must have M21 = 0. But this means that if the columns of
M11 M12
,
M21 M22
the linear part of G, have weight 2, weight 4, or are orthogonal, then the same is true of
columns of
M11
,
0
and hence the columns of M11 itself. In short, if the linear part of G has any of the properties
we are interested in, then so does the linear part of H.
• If G is orthogonal then c1 = 0 and c2 = 0. Recall that M21 = 0, and since a matrix of the
form
A B
0 C
has an inverse of the same form, and the inverse of an orthogonal matrix is its transpose, we
see that M12 = 0. It follows that H (x) = M11 x + M12 a + c1 is actually just H (x) = M11 x
when G is orthogonal, therefore H is orthogonal because M11 is orthogonal.
58
14 Appendix: Number of Gates Generating Each Class
In this appendix, we count how many n-bit gates belong to each of the classes of Theorem 3. Let
us write hGin for the set of n-bit gates generated by G, and # hGin for the number of n-bit gates
generated by G. Then Theorem 83 gives the exact number of gates in each class, while Theorem 87
gives the asymptotics.
i=1
i=1
59
Furthermore,
# hF4 in = # hT4 in .
Let us count each class in turn. To start, note that an n-bit reversible gate is, by definition, a
permutation of {0, 1}n , so there are (2n )! gates in total.
Parity-preserving gates map even-weight strings to even-weight
2 strings, and odd-weight strings
to odd-weight strings. It follows that there are 2 n−1 ! parity-preserving gates. Clearly there
are exactly twice as many parity-respecting gates, since we can append a NOT gate to any parity-
preserving gate to get a parity-flipping gate, and vice versa.
The mod-k-preserving gates (for k ≥ 3) also decompose into a product of permutations, one for
each Hamming weight class modulo k. This leads to the formula
k−1
Y X n
# hCk in = !
j
i=0 j−i≡0(mod k)
The linear part of an affine gate is an n × n invertible matrix A. The number of such matrices
is well-known to be
n−1
Y Yn
n i n(n−1)/2
2i − 1 .
2 −2 =2
i=0 i=1
i=1
i=1
60
We refer to MacWilliams [20] for the formula (below) for the number of orthogonal n × n
matrices. ( 2Q
m−1
2m 2i
i=1 2 − 1 , if n = 2m,
# hT4 in = 2 Qm
2m 2i
i=1 2 − 1 , if n = 2m + 1.
We now turn our attention to counting hT6 in , which is more involved. The approach will be similar
to that of MacWilliams [20]. It will help to consider hT4 in and hT6 in as groups. Indeed, hT4 in is
just the orthogonal group O(n) over F2 , and hT6 in is a proper subgroup.
The idea is to find a unique representative for each of the cosets of hT6 in in hT4 in . Since we
know # hT4 in by [20], dividing by the number of unique representatives will give us # hT6 in as
desired.
Recall that by Lemma 16, the Hamming weight of each column vector of an orthogonal matrix
is either 1 mod 4 or 3 mod 4. If A ∈ hT4 in is an orthogonal matrix with column vectors a1 , . . . , an ,
then the characteristic vector c (A) is an n-dimensional vector whose ith entry, ci (A), is defined as
follows: (
1 if |ai | ≡ 3 (mod 4)
ci (A) = .
0 if |ai | ≡ 1 (mod 4)
The following lemma shows that these characteristic vectors can be used as a representatives for
the cosets of hT6 in .
Lemma 84 Two orthogonal transformations, A, B ∈ hT4 in , are in the same coset of hT6 in if and
only if c (A) = c (B).
Proof. Note that A and B are in the same coset if and only if T := BA−1 = BAT is in hT6 in .
We know that T ∈ hT4 in , and that T (ai ) = bi for all i. Since a1 , . . . , an is an orthogonal basis,
Theorem 37 says that T ∈ hT6 in if and only if T is mod-4-preserving. By Theorem 20, this holds
if and only if |ai | ≡ |bi | (mod 4) for all i, or equivalently, c (A) = c (B).
Lemma 84 shows that it suffices to count the number of possible characteristic vectors. Perhaps
surprisingly, not every characteristic vector is achievable; the following lemma shows exactly which
ones are.
Lemma 85 If A ∈ hT4 in , then |c (A)| ≡ 0 (mod 4). Furthermore, for every characteristic vector
c such that |c| ≡ 0 (mod 4), there exists a matrix A ∈ hT4 in such that c(A) = c.
Proof. Let A ∈ hT4 in with column vectors a1 , . . . , an . Of course, A might not preserve Hamming
weight mod 4. The main idea of the proof is to promote A to an affine function f (x) = Ax ⊕ b
that does preserve Hamming weight mod 4. We know that such a function exists because we can
decompose A into a circuit of T4 gates by Theorem 36. Replacing each such gate with F4 will
yield a circuit of the desired form that preserves Hamming weight mod 4.
Recall from Theorem 20 that if f preserves Hamming weight mod 4, then |ai | + 2 (ai · b) ≡
1 (mod 4). Expanding out this condition we get
(
1 (mod 2) if |ai | ≡ 3 (mod 4)
ai · b ≡ ,
0 (mod 2) if |ai | ≡ 1 (mod 4)
61
which is equivalent to the condition AT b = c(A). Therefore,
n
X n
X X n
X
|b| = |Ac (a)| = ai ci (A) ≡ ci (A) |ai |+2 ci (A) cj (A) (ai · aj ) ≡ ci (A) |ai | ≡ 3 |c (a)| (mod 4) ,
i=1 i=1 i<j i=1
which implies that |b| ≡ |c (A)| (mod 4). But we know by Theorem 20 that |b| ≡ 0 (mod 4). So
|c (A)| ≡ 0 (mod 4), which completes the first part of the lemma.
We now need to show that any characteristic vector of Hamming weight divisible by 4 is realized
by some matrix A ∈ hT4 in . Notice that c (T4 ) = (1, 1, 1, 1). Therefore, by taking an appropriate
tensor product of T4 gates and permuting the rows and columns, we can achieve any characteristic
vector of Hamming weight divisible by 4.
Proof. The condition AT b = c(A) in the proof of Lemma 85 implies that there is a unique vector
b = Ac(A) such that f (x) = A(x) ⊕ b is mod-4-preserving.
Combining Lemmas 84 and 85, we find that the number of representatives for the cosets of
hT6 in in hT4 in equals the number of n-bit strings with Hamming weight 4. An explicit formula
for this quantity is given by Knuth [16, p. 70]. This now completes the proof of Theorem 83.
Table 1 gives the number of generators of each class for 3 ≤ n ≤ 7.
62
Theorem 87 The asymptotic size of each reversible gate class is as follows.
2n n 1
log2 # hToffoliin = n2n − + + log2 2π + O(2−n )
ln 2 2 2
2n
log2 # hFredkin, NOTNOTin = n2n − − 2n + n log2 n + log2 π + O(2−n )
ln 2
log2 # hFredkin, NOTin = log2 # hFredkin, NOTNOTin + 1
2n
log2 # hCk in = n2n − − 2n log2 k + o(2n )
ln 2 √
n 2n n πe n
log2 # hFredkinin = n2 − − 2 log2 + o(2n )
ln 2 2
log2 # hCNOTin = n (n + 1) − α + O(2−n )
log2 # hCNOTNOT, NOTin = n (n − 1) − α + O(2−n )
log2 # hCNOTNOTin = log2 # hCNOTNOT, NOTin − 1
n 1 1
log2 # h∅in = n log2 n − + log2 2π + O
ln 2 2 n
n(n − 1)
log2 # hT4 in = − β + O(2−n )
2
n2 − 3n + 4
log2 # hT6 in = − β + O 2−n/2 ,
2
where
∞
X
α=− log2 (1 − 2−i ) ≈ 1.7919,
i=1
∞
X
β=− log2 (1 − 2−2i ) ≈ 0.53839.
i=1
Recall that # hF4 in = # hT4 in . The asymptotics of the remaining affine classes follow from the
rules
log2 # hG, NOTin = n + log2 # hGin ,
log2 # hG, NOTNOTin = n − 1 + log2 # hGin ,
where hGi is a linear class.
Proof. Most of these results follow directly from Theorem 83 with liberal use of well-known
logarithm properties, especially Stirling’s approximation:
m 1 1
log2 (m!) = m log2 m − + log2 2πm + O .
ln 2 2 m
For the affine classes, we use the fact that
m m
X m(m + 1) X
i
log2 (2 − 1) = + log2 (1 − 2−i )
2
i=1 i=1
m(m + 1)
= − α + O(2−m )
2
63
where α = − ∞ −i .
P
i=1 log 2 1 − 2 Note that
P∞ α = − log2 −2i(1/2;
1/2) ∞ where (1/2; 1/2)∞ is the
q-Pochhammer symbol. Similarly, β := − i=1 log2 1 − 2 = − log2 (1/4; 1/4) ∞ differs from
the mth partial sum by O(2−2m ).
It turns out that the even and odd cases of # hT4 i have the same asymptotic behavior, and
similarly for the four cases of # hT6 i.
However, there are two special cases that require extra care: hCk i (for k ≥ 3) and hFredkini.
Recall that
k−1
Y
# hCk in = ai !.
i=0
n 2n
P
where we define ai = j≡i(mod k) j . Clearly ai = k (1 + o(1)). Then Stirling’s approximation
gives
k−1
X ai
log2 # hCk in = ai log2 ai − + o(ai )
ln 2
i=0
k−1
2n
X ai
= ai log2 + ai log2 (1 + o(1)) − + o(ai )
k ln 2
i=0
2n
= n2n − − 2n log2 k + o(2n ).
ln 2
For hFredkini, we use the fact that if x is a uniformly-random n-bit string, then the entropy of
|x| is
n
1 πen 1 X
−n n −n n
log2 +O =− 2 log2 2 .
2 2 n i i
i=0
One can show this by approximating the binomial with a Gaussian distribution. Rearranging gives
us √
n n
X n n n n πe n 2
log2 = n2 − 2 log2 −O .
i i 2 n
i=0
One can clearly see “the pervasiveness of universality” in Table 1: within almost every class,
the gates that are universal for that class quickly come to dominate the gates that are not universal
for that class in number. Theorem 87 lets us make that observation rigorous.
Corollary 88 Let C be any reversible gate class, and let G be an n-bit gate chosen uniformly at
random from C. Then
Pr [G generates C] = 1 − O 2−n ,
64
unless C is one of the “NOT classes” (hFredkin, NOTi, hF4 , NOTi, hT6 , NOTi, or hNOTi), in
which case
1
Pr [G generates C] = − O 2−n .
2
p−1 k
G2 (x) ≡ |x| + (mod k) ,
2
and so would have order exactly 2 mod k. For that reason, it suffices to rule out, for all k ≥ 2 and
all n, the possibility of a reversible transformation G that satisfies
be the set of n-bit strings of Hamming weight j mod 2k. Then the problem boils down to showing
that for all k ≥ 2 and n, there exists a j < k such that |An,j | =
6 |An,j+k |—and therefore, that no
mapping from An,j to An,j+k (or vice versa) can be reversible.
This, in turn, can be interpreted as a statement about binomial coefficients: for all k ≥ 2 and
all n, there exists a j such that
X n X n
6= .
i i+k
i=j,j+2k,j+4k,... i=j,j+2k,j+4k,...
A nice way to prove the above statement is by using what we call the wraparound Pascal’s triangle
of width 2k: that is, Pascal’s triangle with a periodic boundary condition. This is simply an
iterative map on row vectors (a0 , . . . , a2k−1 ) ∈ Z2k , obtained by starting from the row (1, 0, . . . , 0),
then repeatedly applying the update rule a′i := ai + a(i−1) mod 2k for all i. So for example, when
65
2k = 4 we obtain
1 0 0 0
1 1 0 0
1 2 1 0
1 3 3 1
2 4 6 4
6 6 10 10
16 12 16 20
.. .. .. ..
. . . .
It is not hard to see that the ith entry of the nth row of the above “triangle,” encodes |An,i |: that
is, the number of n-bit strings whose Hamming weights are congruent to i mod 2k.
So the problem reduces to showing that, when k ≥ 2, no row of the wraparound Pascal’s triangle
of width 2k can have the form
(a0 , . . . , ak−1 , a0 , . . . , ak−1 ) .
That is, no row can consist of the same list of k numbers repeated twice. (Note we can get rows
that satisfy ai = ai+k for specific values of i: to illustrate, in the width-4 case above, we have
a1 = a3 = 4 in the fifth row, and a0 = a2 = 16 in the seventh row. But we need to show that no
row can satisfy ai = ai+k for all i ∈ {0, . . . , k − 1} simultaneously.) We prove this as follows.
Notice that the update rule that defines the wraparound Pascal’s triangle, namely a′i := ai +
a(i−1) mod 2k , is just a linear transformation on R2k , corresponding to a 2k×2k band-diagonal matrix
M . For example, when k = 2 we have
1 1 0 0
0 1 1 0
M = 0 0 1 1 .
1 0 0 1
a0 + a2 + · · · + a2k−2 = a1 + a3 + · · · + a2k−1 .
66
Alternate Proof of Theorem 19. We will actually prove a stronger result, that if G is any
nontrivial affine gate that preserves Hamming weight mod k, then either k = 2 or k = 4. We
have G (x) = Ax ⊕ b, where A is an n × n invertible matrix over F2 , and b ∈ Fn2 . Since G
is nontrivial, Lemma 18 implies that at least one of A’s column vectors v1 , . . . , vn must have
Hamming weight at least 2; assume without loss of generality that v1 is such a column. Notice
that |G (0n )| ≡ |b| ≡ 0 (mod k), while
Clearly |G (e1 )| ≡ |e1 | ≡ 1 (mod k). Let y be an n-bit string whose first bit is 0. Then by
Lemma 17, we have
1 + |y| ≡ |e1 ⊕ y|
≡ |G (e1 ⊕ y)|
≡ |G (e1 ) ⊕ G (y) ⊕ b|
≡ |Ae1 ⊕ b ⊕ b ⊕ G (y)|
≡ |v1 ⊕ G (y)|
≡ |v1 | + |G (y)| − 2 (v1 · G (y))
≡ |v1 | + |y| − 2 (v1 · G (y)) (mod k) .
Thus
2 (v1 · G (y)) ≡ |v1 | − 1 (mod k) .
Note that the above equation must hold for all 2n−1 possible y’s that start with 0. Such y’s, of
course, account for half of all n-bit strings. So we deduce that
1
Pr [2 (v1 · x) ≡ |v1 | − 1 (mod k)] ≥ .
x∈{0,1}n 2
Equivalently, if we let S be the set of all x ∈ {0, 1}|v1 | such that 2 |x| ≡ |v1 | − 1 (mod k), then we
find that
1
Pr [x ∈ S] ≥ , (8)
x∈{0,1} |v1 | 2
S ′ := {x ⊕ e1 : x ∈ S}
contain, for each x ∈ S, the string x′ obtained by flipping the first bit of x. Then clearly |S| = |S ′ |,
and S and S ′ are disjoint (since no two elements of S are neighbors in the Hamming cube). So
67
it suffices to show that S ∪ S ′ still does not cover all of {0, 1}|v1 | . Since k2 ≥ 3, observe that
S ′ can contain at most one string of Hamming weight 1, namely x′ = 10 · · · 0 (the neighbor of
x = 0|v1 | ). But since |v1 | ≥ 2, there are other strings of Hamming weight 1, not included in S ′ .
Hence S ∪ S ′ 6= {0, 1}|v1 | .
Next suppose k ≥ 3 is odd. Then first, we claim that we cannot have |v1 | = 2. For suppose
we did. Then |b ⊕ v1 | would be either |b|, or |b| − 2, or |b| + 2. But this contradicts the facts that
|b| ≡ 0 (mod k), while |b ⊕ v1 | ≡ 1 (mod k). Since |v1 | 6= 1, this means that |v1 | ≥ 3. But in that
case, we can use a similar argument as before to show that (8) cannot hold, and that |S| < 2|v1 |−1 .
Letting S ′ be as above, we again have that |S| = |S ′ |, and that S and S ′ are disjoint. And we will
again show that S ∪ S ′ fails to cover all of {0, 1}|v1 | . Notice that, since the Hamming weights of
the S elements are separated by k ≥ 3, every S ′ element that is “below” an S element must start
with 0, and every S ′ element that is “above” an S element must start with 1. Also, since |v1 | ≥ 3,
there must be some x′ ∈ S ′ with a Hamming weight that is neither maximal nor minimal (that
is, neither |v1 | nor 0). But since the first bit of x′ has a fixed value, not all strings of Hamming
weight |x′ | can belong to S ′ . Hence S ∪ S ′ 6= {0, 1}|v1 | , and |S| < 2|v1 |−1 .
68