An Almost-linear Ime Decoding Algorithm for Quantum LDPC Codes Under Circuit-level Noise (2)
An Almost-linear Ime Decoding Algorithm for Quantum LDPC Codes Under Circuit-level Noise (2)
Antonio deMarti iOlius,1, ∗ Imanol Etxezarreta Martinez,2, † Joschka Roffe,3, ‡ and Josu Etxezarreta Martinez1, §
1
Department of Basic Sciences, Tecnun - University of Navarra, 20018 San Sebastian, Spain.
2
Independent researcher, San Sebastian, Spain.
3
Quantum Software Lab, University of Edinburgh, United Kingdom.
Fault-tolerant quantum computers must be designed in conjunction with classical co-processors
that decode quantum error correction measurement information in real-time. In this work, we in-
troduce the belief propagation plus ordered Tanner forest (BP+OTF) algorithm as an almost-linear
time decoder for quantum low-density parity-check codes. The OTF post-processing stage removes
qubits from the decoding graph until it has a tree-like structure. Provided that the resultant loop-
free OTF graph supports a subset of qubits that can generate the syndrome, BP decoding is then
guaranteed to converge. To enhance performance under circuit-level noise, we introduce a technique
for sparsifying detector error models. This method uses a transfer matrix to map soft information
from the full detector graph to the sparsified graph, preserving critical error propagation informa-
tion from the syndrome extraction circuit. Our BP+OTF implementation first applies standard BP
to the full detector graph, followed by BP+OTF post-processing on the sparsified graph. Numeri-
arXiv:2409.01440v1 [quant-ph] 2 Sep 2024
cal simulations show that the BP+OTF decoder achieves logical error suppression within an order
of magnitude of state-of-the-art inversion-based decoders while maintaining almost-linear runtime
complexity across all stages.
idea behind OTF post-processing is based on the fact BP+BP+OTF decoder achieves error suppression per-
that the BP algorithm is guaranteed to converge when formance that is within an order-of-magnitude of state-
applied to decoding graphs with a tree-like structure. of-the art BP+OSD decoders.
The OTF post-processor leverages this by constructing Our sparsfication routine for detector error models is
an ordered spanning tree of the QEC code’s decoding general, and has the potential to be useful for other fam-
(Tanner) graph, giving priority to nodes that were as- ilies of decoders beyond BP+BP+OTF. For example, we
signed a high probability of supporting an error during observe that BP+BP+OSD decoders require consider-
the initial BP decoding attempt. This spanning tree can ably fewer BP iterations than is necessary for standalone
contain multiple disconnected components, so we refer to BP+OSD applied to the full detector graph.
it as an ordered Tanner forest. Once the ordered Tanner One of the key advantages of BP+BP+OTF is its
forest is formed, the BP algorithm can be applied directly simplicity: the decoder requires just three applications
to it to identify a decoding solution that aligns with the of a standard BP decoder and a single application of
measured syndrome. a modification of the well-known Kruskal algorithm for
The OTF post-processor employs a modified version generating the ordered spanning forest. This straight-
of Kruskal’s algorithm to find the ordered Tanner for- forward design allows for the construction of highly
est, with a worst-case runtime complexity of O(n log(n)) efficient hardware implementations using off-the-shelf
[19]. This process involves searching through the nodes application-specific integrated circuits (ASICs). As a re-
of the decoding graph and eliminating those that would sult, the BP+OTF decoder is an appealing option for
introduce loops. A second round of BP is then run on decoding in near-term experiments.
the ordered Tanner forest, with a linear time complexity
proportional to the number of remaining nodes. As a re-
sult, the combined BP+OTF decoder achieves an almost- II. PRELIMINARIES
linear runtime relative to the code’s block length.
In practice, QEC decoding algorithms are run on a A. Calderbank-Shor-Steane codes
detector graph or detector error model which relates
error mechanism in the QEC circuit to the measured
syndromes [20]. A problem with applying OTF post- Calderbank-Shor-Steane (CSS) codes describe a large
processing directly to the detector graph is that the family of quantum error correction protocols that can
columns are typically high-weight compared to the code be expressed in terms of a pair of classical binary codes,
capacity or phenomenological graph for the same code. Q(HX , HZ ). The HX matrix describes X-type stabilisers
As a consequence, it is often the case that many graph that detect phase-flip errors and the HZ matrix Z-type
nodes need to be discarded in the search for the ordered stabilisers that detect bit-flip errors. If HX/Z are sparse,
Tanner forest, sometimes to the point where the remain- we can consider the CSS code to be a quantum LDPC
ing graph no-longer contains enough qubits to support code.
a solution to the syndrome. To address this problem, The decoding task for CSS codes involves solving two
we propose a novel procedure for mapping a detector syndrome equations for X/Z errors
graph to a sparsified detector graph with fewer short-
length loops. When applied to this sparsified detector HX/Z · eZ/X mod 2 = sX/Z . (1)
graph, OTF post-processing is more likely to succeed.
We propose a three-stage decoder specifically op- As the above syndrome equations amount to a pair of
timized for circuit-level noise which we refer to as classical decoding problems, it is possible to use existing
BP+BP+OTF. First, standard BP decoding is applied classical decoders to directly decode CSS codes. Note
to the full detector graph. Second, the soft-information that the above syndrome equation is over binary field
output of the first BP round is mapped to the sparsified modulo-2. From this point onwards, we will assume all
detector graph via a pre-defined transfer matrix. Using linear algebra operations are performed modulo-2 unless
this soft-information as prior-information, BP decoding stated otherwise.
is applied a second time to the sparsified detector graph.
Third, OTF post-processing is run on the sparsified de-
tector graph guided by the soft-information output of the B. Belief propagation decoding
previous round of BP. Note that the BP+BP+OTF de-
coder terminates at the first successful decoding. E.g., Belief propagation (BP) is common decoding algo-
the final round of OTF will not be invoked if either of rithm in classical communication technologies [8, 9]. De-
the preceding rounds of BP succeeds. coders based on BP exploit the sparsity of the LDPC
To benchmark the BP+BP+OTF decoder, we run code’s parity check matrix to efficiently solve the bit-wise
Monte Carlo simulations under depolarizing circuit- decoding problem
level noise for bivariate bicycle codes, specifically the
[[72,12,6]], [[108,8,9]], and [[144,12,12]] instances first in-
X
p(ei ) = p(e|s). (2)
vestigated in [5]. Our results demonstrate that the ∼i
3
The specific advantage of the BP decoder lies in its speed. ear combination of syndrome measurements that sums
The message passing operations that underpin the algo- to zero when there are no errors in the circuit. In a
rithm occur locally between the nodes of a graphical rep- repeated stabiliser measurement circuit, the most intu-
resentation of the parity check matrix known as a Tanner itive choice of detectors involves comparing the parity
graph, and can be run in parallel on distributed hard- between consecutive syndrome measurements. For ex-
ware. In this setting, BP has a worst-case runtime com- ample, if the same check yields a value of 1 in two con-
plexity O(n) in the code’s block-length n. secutive rounds, the detector value will be trivial. The
The product-sum algorithm is a variant of the BP de- resulting detector vector sD is therefore sparser than the
coder that is known to yield exact marginals for par- original syndrome s. Correspondingly, the circuit-level
ity check matrices that have tree-like Tanner graphs. In decoding matrix HCL is replaced with a detector matrix
practice, however, most high-rate LDPC codes contain Hdem , as a d × n binary matrix, where d is the number of
loops that compromise the performance of product-sum measurements (detector elements) and n is the number
BP and can prevent it from converging to a valid solu- of possible errors in the circuit [21]. Here, the rows corre-
tion. A challenge in designing LDPC codes lies in finding spond to detector measurements rather than syndromes.
parity check matrices that are sufficiently loop-free and Typically, Hdem is also sparser than HCL . This sparser
sparse. structure of the detector error model, described by the
For quantum LDPC codes, it is particularly difficult to equation Hdem · e = sD , makes the decoding problem
design HX and HZ matrices that are a well-suited for BP more amenable to BP decoding. From this point on-
decoding. The reason for this can be attributed to degen- wards, we will refer to s as a syndrome (code capacity,
erate errors that lead to cyclic dependencies in the de- phenomenological noise) or a detector (circuit-level noise)
coding graph. As such, BP-based algorithms typically do interchangeably.
not work as effective decoders for quantum LDPC codes. Once a set of fault-mechanisms ê have been identi-
Indeed, a standard implementation of product-sum BP fied by decoding the detector error model Hdem · e = s,
fails to yield a threshold for the surface code [14]. In prac- the task remains to determine their logical action. This
tice, quantum decoders require BP to be augmented with can be analysed using the logical observable matrix,
a post-processing routine to achieve satisfactory perfor- Odem ∈ Fk×n , where k is the number of logical qubits
mance in terms of both threshold and sub-threshold error encoded, and n represents the number of possible fault
suppression [12–18]. mechanisms in the detector error model. This matrix
links errors in the circuit to measured logical observables,
which are defined as binary sums of measurements that
C. Decoding circuit-level noise and the detector correspond to the outcomes of logical operator measure-
error model ments for any logical state encoded by the QEC code.
The logical observable matrix associated with the detec-
tor error model is crucial for verifying whether the error
In classical error correction, the decoding problem in-
estimate produced by the decoder successfully corrects
volves directly solving the syndrome equation H · e = s,
the actual faults that occurred. A logical error is de-
where H is the parity-check matrix of the code. In quan-
tected if the logical observable derived from the decoded
tum error correction, however, an additional layer of com-
error does not match the true logical observable, i.e., if
plexity arises due to the fact that syndromes are mea-
Odem · e ̸= Odem · ê.
sured using noisy circuits, leading to error propagation
that is not described by the CSS code’s HX/Z matri- The detector error model, as well as the the logical
ces. Instead, the circuit-level decoding problem is char- observable matrix for a specific detector error model re-
acterised by a binary matrix, HCL , where the columns lated to the syndrome extraction circuit can be generated
correspond to error mechanisms within the circuit, and using a stabiliser simulator [22, 23].
the rows correspond to syndrome measurements. For in-
stance, a non-zero entry at position (HCL )ij indicates
that error mechanism j triggers syndrome measurement III. THE ORDERED TANNER FOREST
i. Once the circuit-level matrix HCL is constructed, the DECODER, AN ALMOST-LINEAR TIME
decoding problem becomes equivalent to that of decoding POST-PROCESSOR
a classical linear code, specifically solving HCL · e = s.
The key conceptual difference is that the columns of HCL In this section, we introduce the ordered Tanner for-
represent circuit error mechanisms, rather than Pauli- est (OTF) decoder as an almost-linear post-processor for
X/Z errors, as is the case in code capacity decoding using general QLDPC codes. The OTF decoder is a post-
the HX/Z CSS matrices. processing algorithm that is invoked if the original round
In practice, QEC protocols operate by repeatedly mea- of BP decoding fails, i.e., when the estimated BP error
suring the same stabiliser extraction circuit over time. êBP does not satisfy the syndrome equation, H ·êBP ̸= s.
This repeating structure can be leveraged to simplify the Being a post-processor, OTF requires the posterior prob-
circuit-level decoding problem by mapping it to a detector abilities coming from a BP decoding stage (or some pro-
error model [20]. A detector vector is defined as a lin- cess able to provide soft reliability information) and is
4
2. Use the soft-information output of the BP decod- 3. Belief propagation over the OTF graph. This has
ing, pBP (e), to order the qubits from most-to-least worst-case runtime O(n), where n is the block-
likely of being in the support of the error. length of the code [9].
3. Apply the modified Kruskal algorithm to the par-
The complexity is dominated by the second step with
ity check matrix, considering qubits in the order
runtime O(n log n). Thus, the OTF decoder therefore
determined in step 2, to obtain the OTF parity
has almost-linear time complexity with the block-length
check matrix Hotf .
of the code.
4. Solve the OTF decoding problem using a product-
sum BP decoder.
B. Graph sparsity and OTF graph validity
5. Verify the output of the OTF decoding: if Hotf ·
êotf = s, then the decoding is valid. This will
be the case for all instance of Hotf where s ∈ The OTF post-processor can be viewed as an approx-
image(Hotf ), assuming that the BP decoder has imate matrix inversion method: by generating a span-
been allowed to run for a number of iterations equal ning tree of the original graph, it seeks to identify a
to the column count of Hotf . set of linearly independent columns that can reproduce
the syndrome. This set of columns can be identified in
O(n log(n)) time, which is a significant improvement over
the O(n3 ) worst-case complexity of matrix inversion via
A. Complexity of OTF post-processing Gaussian elimination. However, unlike Gaussian elimi-
nation, Kruskal’s algorithm does not guarantee that a
The runtime complexity of each of the steps in OTF compatible set of linearly independent columns will be
post-processing is outlined below: found such that s ∈ image(Hotf ).
5
The success of OTF post-processing is closely tied to ror channel for the sparsified detector graph. This en-
the sparsity of the parity-check matrix’s columns: the ables the soft-information output of an initial run of BP
lower the average column weight j̄, the more likely it applied to the detector model to re-purposed in the spar-
is to find a valid ordered Tanner forest Hotf . To under- sified detector graph. The benefit of this is two-fold:
stand this, consider the process by which the OTF matrix first, the sparsified detector graph decoding will account
is constructed. The search begins with an empty ma- for aspects of error propagation through the circuit, and
trix Hotf
0
. The OTF matrix is then populated by adding second, the BP decoding over the OTF graph will be
columns from the original matrix, H, that do not intro- over a non-uniform noise channel, improving its rate of
duce loops into Hotfi
, where i denotes the tree growth convergence.
step (i.e., the number of columns added so far). At each The sparsification routine is motivated by techniques
growth step i, the Tanner graph of Hotfi
is represented as used to map detector error models for surface codes to
T (VD , VP , E), where VD are the data nodes, VP are the
i graphs with column-weight ≤ 2 suitable for decoding
parity nodes, and E are the edges. Each column under with matching decoders such as minimum-weight perfect-
consideration from H can similarly be represented as a matching [25] or union-find [26]. Furthermore, the trans-
set of parity nodes ṼP , with the column weight j defined fer matrix is a generalization of the methods proposed
as |ṼP |. for mapping circuit-level soft-information in [27].
For a new column to introduce a loop in Hotf , it must We now formally define the sparsified detector error
satisfy the condition |ṼP ∩ VP | > 1. Therefore, a lower model and its corresponding transfer matrix.
average column weight j̄ will lead to a higher chance that Definition 1 (The sparsified detector matrix)
a new column will not produce a loop. Consequently, a The detector error model is a d × n binary matrix
sparser graph is more likely to result in an ordered Tan- Hdem = (hdem 1 2
, hdem n
, · · · , hdem ), where hdem
i
∈ Fd2 is the
ner forest Hotf with a greater number of linearly inde- binary vector representing the ith column. Let Λ be a
pendent columns, thereby increasing the probability that list of ns indices denoting all columns in Hdem that have
the condition s ∈ image(Hotf ) is satisfied. Hamming weight |hdem | ≤ γ, where γ is the maximum
Hamming weight of a column in the code’s CSS matrices
HX/Z . The sparsified detector model, Hsdem , is a d × ns
IV. SPARSIFYING DETECTOR ERROR
sub-matrix of the detector model Hsdem ⊆ Hdem , formed
MODELS
by selecting the low-weight columns of Hdem indexed
by Λ. Similarly, the sparsified logical observable matrix
As described in Section II C, QEC codes are decoded Osdem is composed of the subset of columns of Odem
by means of a detector graph that relates circuit fault indexed by Λ.
locations to linear combinations of syndrome measure-
ments. The detector error model enables the propagation Recall that each column in a detector error model
of errors through the syndrome extraction circuit to be matrix, Hdem , corresponds to a fault mechanism in the
accounted for during decoding. Due to the richer set of code’s syndrome extraction circuit, each of which triggers
error mechanisms it accounts for, it is typically the case a sequence of detectors indexed by the non-zero entries
that a detector error model graph will be less sparse than in the column. As such, Hsdem represents a reduced set
the corresponding CSS parity check matrices HX/Z of the of error mechanisms. We now define the transfer matrix
code. This lack of sparsity is detrimental to the success that relates the full detector error model to its sparsified
of OTF post-processing, as it increases the probability form:
that the generated OTF matrix will not satisfy the va-
lidity condition s ∈ image(Hotf ). See Section III B for Definition 2 (The transfer matrix) The trans-
more details. fer matrix Atr ∈ Fn2 s ×n describes the mapping from
We now describe a sparsification routine which is de- sparsified detector model to the full detector error model
signed to re-express a detector error model graph into
a sparser form that is more suitable for OTF decoding. Hsdem · Atr = Hdem . (3)
Specifically, our method maps the detector graph Hdem ∈
Fd×n into a sparsified detector graph Hsdem ∈ F2d×ns Each column ai of Atr is a vector that maps column hidem
2
that has maximum column weight at most equal to the of Hdem to a linear combination of the columns of Hsdem
maximum column-weight, γ, of HX/Z . This is achieved X j
by finding linear-combinations of columns of weight ≤ γ
i
hdem = aij hsdem . (4)
that generate each of the columns in Hdem . As the Hsdem j
matrix is by design sparser and smaller than the detec- The transfer matrix preserves the action of the logical
tor matrix Hdem , applying OTF post-processor Hsdem is observable matrix such that
more likely to result in a successful decoding.
Our mapping includes a transfer matrix that allows
X
the error channel associated with each error mechanism Odem · eidem = Osdem · aij ejsdem , (5)
in the detector graph to be mapped to an equivalent er- j
6
where eidem and ejsdem refer to the circuit faults associated When combining multiple error probabilities from the
to columns i and j of the detector and spasified models detector graph into a single probability for the sparsified
accordingly. graph, it is crucial to account for parity. For instance,
if two individual error mechanisms in the detector error
Importantly, the decomposition given by the trans- model, e1dem and e2dem , trigger the same component in
fer matrix maps the faults of the detector error model the sparsified detector matrix, eksdem , their combined ef-
to combinations of faults on the sparser detector graph fect would be (eksdem + eksdem ) mod 2 = 0. Therefore,
with the same logical action. Note, also, that the trans- probabilities should be summed only when an odd num-
fer matrix is not unique: multiple decompositions of the ber of detector fault mechanisms contribute to the rele-
columns of Hdem into components of Hsdem are possi- vant sparsified detector fault mechanism. Thus, we want
ble. Our primary objective is to identify sparse transfer to determine the probability of the sparsified faults as
matrices, where each column ai in Atr has the smallest the random variable defined as the modulo-2 sum (par-
possible Hamming weight. This is crucial because the ity constraint) of the binary random variables associated
transfer matrix inherently maps low-weight fault mecha- to the detector error model faults that relate to it via
nisms in the detector error graph to higher-weight errors Atr . This mapping of probabilities from Hdem to Hsdem
in the sparsified graph. By ensuring the transfer matrix can then be accomplished by considering the probabil-
is sparse, we minimise the likelihood that the mapped ity of the exclusive OR (XOR) of the component binary
errors will exceed the code distance. Additionally, a random variables, as follows [28]
sparse transfer matrix is advantageous for mapping soft-
information to the sparsified detector graph, as will be
expanded upon in the next section.
The question as how to best optimise the transfer ma-
M
p(eisdem ) = p ekdem
trix is an interesting question for future work. For the
k∈{j:Aij
tr ̸=0}
QEC codes simulated in this work, we found that an
exhaustive search method was sufficient. Details of our
1 Y
exhaustive strategy are outlined in Appendix B. = 1− (1 − 2p(ekdem )) , (6)
2
k∈{j:Aij
tr ̸=0}
output, denoted as pBP (edem ). This output assigns an OUTPUT: Estimated error, ê.
error probability to each fault mechanism in the detec-
tor model Hdem . Our objective is to develop a method 1: (êdem , p(edem )) ← BP_decode(s, pch , Hdem )
to translate pBP (edem ) into a corresponding probabil- 2: if s == ŝdem then
ity vector p(esdem ), which allocates error probabilities 3: return êdem
to each fault mechanism in the sparsified detector error 4: end if
5: p(esdem ) ← Map(p(edem ), Atr )
model Hsdem .
6: êsdem ← BP_decode(s, p(esdem ), Hsdem )
The transfer matrix Atr establishes a relationship be- 7: return êsdem
tween fault mechanisms in the detector model and those
in the sparsified detector model. Specifically, the list
of detector fault mechanisms that trigger a given fault The complete BP+BP algorithm is outlined in Algo-
mechanism i in the sparsified model is represented by rithm 1. Both stages of the BP algorithm run in lin-
{j : Aij
tr ̸= 0}. ear time. To fully understand the performance of the
7
BP+BP decoder, we need to analyse the runtime scal- The runtime complexity of this decoder is domi-
ing of the mapping from the detector error model to the nated by the O(n log(n)) complexity of the OTF post-
sparsified model. To this end, it is insightful to treat processing stage. As such, the full BP+BP+OTF de-
the transfer matrix Atr as analogous to a Tanner graph, coder is an almost-linear time decoder for circuit-level
where the variable nodes represent elements of the de- decoding of QLDPC codes.
tector error model, and the check nodes correspond to
elements of the sparsified detector error model. Equation
6 can be interpreted as a message-passing update from VI. RESULTS
the variable nodes to the check nodes. By using the soft
information from the first BP update round as the a pos- We numerically benchmark the performance of the
teriori soft information, the probabilities p(eisdem ) can be BP+BP+OTF decoder via Monte Carlo simulations of
computed in a single message-passing step. The compu- the bivariate bicycle codes under a depolarising circuit-
tational complexity of this mapping process is essentially level noise model [5]. Precise details of the numerical
equivalent to that of running BP. The key difference is methods used for this section can be found in Appendix
that the column weight is now determined by the trans- C, whilst a detailed specification of the circuit-level noise
fer matrix, which is not uniquely defined. However, in all model is outlined in Appendix D.
QEC codes examined in this study, the transfer matrices To demonstrate the versatility of the sparsified detec-
did not exhibit column weights exceeding 3, suggesting tor model mapping, we conduct tests with various BP-
that it is generally lower than the column weight of the based decoders. One key advantage of the detector er-
detector error model. Therefore, the worst-case complex- ror model sparsification technique is its ability to signifi-
ity of the overall BP+BP decoder can be bounded by cantly reduce the total number of iterations required for
O(n), indicating linear time complexity. BP to converge. Below, we outline the decoders we simu-
late, along with the specific BP iteration counts used for
each:
V. AN ALMOST-LINEAR TIME DECODER
FOR QLDPC CODES UNDER CIRCUIT-LEVEL • Belief propagation (BP): 1000 iterations over
NOISE the detector error model graph.
• Two-stage belief propagation (BP+BP): 30
We now outline BP+BP+OTF as a decoding method iterations over the detector error model and 100
with almost-linear runtime complexity for QLDPC codes iterations over the sparsified detector graph.
operating under circuit-level noise. The decoder first
runs the two-stage BP+BP decoder on the detector • Two-stage BP + ordered Tanner forest
graph and its corresponding sparsified graph as described (BP+BP+OTF): 30 iterations over the detector
in the previous section. If BP+BP fails, the OTF decoder error model, 100 iterations over sparsified detector
is applied to the sparsified graph as a post-processor. graph and 100 for solving the Tanner forest.
Pseudo-code for the BP+BP+OTF decoder is provided
• Belief propagation + ordered statistics de-
below:
coding (BP+OSD): 1000 iterations over the de-
tector error model Tanner graph.
Algorithm2 BP+BP+OTF decoder for circuit-level
noise • Two-stage BP + OSD (BP+BP+OSD): 30
iterations over the detector error model and 100
INPUT: Measured syndrome: s ∈ Fd2 over the sparsified detector error model.
a priori probabilities of DEM error mechanisms, pch ∈ Rn
Detector matrix: Hdem ∈ Fd×n 2 , Note that the iteration numbers used for the differ-
Sparsified detector matrix: Hsdem ∈ Fd×n
2
s
, ent decoding algorithms vary significantly. The primary
s ×n
Transfer matrix: Atr ∈ Fn 2 motivation for this variation is to demonstrate that, by
OUTPUT: Estimated error, ê. employing the proposed detector error model sparsifica-
tion technique, far fewer iterations of BP are necessary.
1: (êdem , p(edem )) ← BP_decode(s, pch , Hdem )
2: if s == ŝdem then This reduction in iterations leads to a substantial de-
3: return êdem crease in runtime over a straight-forward application of
4: end if BP for NBP iterations.
5: p(esdem ) ← Map(p(edem ), Atr )
6: (êsdem , p(esdem )) ← BP_decode(s, p(esdem ), Hsdem )
7: if s == ŝsdem then A. Bivariate bicycle codes
8: return êsdem
9: end if Figure 2 shows the results of Monte Carlo decoding
10: êotf ← OTF(Hsdem , p(esdem ))
simulations for families of bivariate bicycle codes. Decod-
11: return êotf
ing simulations were run for the [[72, 12, 6]], [[108, 8, 10]],
8
10 3 10 3 10 3
PL /
PL /
PL /
10 4 10 4 10 4
10 5 10 5 10 5
10 3 2 × 10 3 5 × 10 3 10 3 2 × 10 3 5 × 10 3 10 3 2 × 10 3 5 × 10 3
p p p
BP BP+BP BP+BP+OTF BP+BP+OSD BP+OSD
Figure 2: Logical error rate per syndrome extraction round with dependence on the physical error rate for three
bivariate bicycle codes under different circuit-level decoding strategies. The left, middle and right plots correspond
to bivariate bicycle codes with l = 6, 9 and 12 respectively, m = 6, A = x3 + y + y 2 and B = y 3 + x + x2 . Each code
is simulated over a number of syndrome rounds equal to its distance δ.
and [[144, 12, 12]] bivariate bicycle codes first introduced level noise. The OTF post-processor removes qubits from
in [5]. Notably, the proposed BP+BP protocol signif- the decoding graph until it achieves a tree-like structure,
icantly enhances performance compared to running BP thereby increasing the likelihood that subsequent rounds
directly on the detector error model Tanner graph across of BP will converge. Additionally, we presented a novel
all the codes. Specifically, we observe nearly an order- method for mapping detector error models to sparser ma-
of-magnitude reduction in the logical error rate. Impor- trices while preserving critical information about circuit-
tantly, this improvement is achieved with approximately level fault propagation. Numerical simulations demon-
90% fewer overall BP iterations (1000 vs. 130), resulting strate that using this mapping, the BP+BP+OTF de-
in a significantly reduced runtime. However, the results coder nearly matches the performance of state-of-the-art
also indicate that simply increasing the code size does not decoders such as BP+OSD when applied to families of
yield an improvement in the logical error rate when using bivariate bicycle codes under circuit-level noise.
BP alone. Therefore, post-processing techniques are still The sparsification routine we propose extends beyond
necessary to achieve the desired threshold-like behavior the BP+BP+OTF decoder. Our simulations suggest
for the bivariate bicycle codes, as anticipated. that performing an initial round of decoding on the full
Figure 2 illustrates that the performance of BP+OSD detector model, followed by a second round on the spar-
is effectively matched by the BP+BP+OSD approach sified detector model, can enhance performance across
evaluated here. The key point is that this performance various decoders, including plain BP and BP+OSD.
parity is achieved with approximately 90% fewer BP it- The sparsified detector model features fewer loops, fewer
erations overall. Similarly, the proposed BP+BP+OTF columns, and a less redundant structure. Moreover, by
achieves a logical error probability within an order of mapping the soft information from the first BP round
magnitude of BP+OSD, again using around 90% fewer to the sparsified detector model, the second BP round
BP iterations. Furthermore, the OTF post-processor has is supplied with a non-uniform error channel, accelerat-
a proven almost-linear runtime complexity, ensuring that ing convergence. Our results show that this mapping
the entire decoding process is extremely fast. to the sparsified detector model significantly reduces the
It is also noteworthy that the error curve for number of required BP iterations across all the decoders
BP+BP+OTF steadily diverges from those of BP+OSD we have investigated. We anticipate that other decoder
and BP+BP+OSD as the code distance increases. This families – for example BP+AC [16], BP+LSD [18], and
divergence may be due to the larger size of the OTF, BP+CB [17] – will also benefit from the sparsified detec-
which might require additional BP iterations to fully con- tor error model.
verge. In future work, we plan to investigate the number In future work, we will explore the application of the
of BP rounds necessary for BP+BP+OTF to match the BP+BP+OTF decoder to various code families, includ-
performance of running BP+OSD on the detector error ing surface codes, hypergraph product codes [6], and
model. lifted product codes [7]. For surface codes, it would
be particularly interesting to investigate whether the
sparsified detector error model mapping can enhance
VII. CONCLUSION the performance of existing decoders, such as those in
[27]. For instance, this could involve implementing a
In this work, we introduced BP+BP+OTF as an al- BP+BP+Union-Find or BP+BP+MWPM approach.
most linear-time decoder for QLDPC codes under circuit- The primary failure mode of the BP+BP+OTF algo-
9
rithm arises in cases where the OTF post-processor fails potentially be lower compared to more specialised de-
to produce a spanning tree capable of supporting the syn- coders such as BP+OSD [31].
drome, i.e., when s ∈ / image(Hotf ). The probability of
such failures can be reduced by optimising the sparsity of
the transfer matrix that maps the detector model to its VIII. CODE AVAILABILITY
sparsified form. In this work, we used an exhaustive ap-
proach to derive the mapping, but further optimisations The code for the BP+OTF decoder can be found in
could be achieved by, for example, explicitly excluding the following Github repository: https://github.com/
elements of Hsdem that introduce loops. This remains an Ademartio/BPOTF. In future versions we will include a
area for future investigation. script for obtaining the transfer matrix, the two-stage BP
Further improvements in the runtime of process and the overall BP+BP+OTF decoder.
BP+BP+OTF could be achieved by exploring par-
allelisation methods for the tree search step in the OTF
post-processor. One possible approach is to combine IX. ACKNOWLEDGEMENTS
OTF post-processing with the parallel cluster-growth
strategies employed by the BP+LSD decoder [18]. We thank Oscar Higgott for many useful comments
The OTF post-processor operates on the principle that and recommendations, as well as Anqi Gong for discus-
QLDPC decoding can be enhanced by modifying the sions on loops and trapping sets present in the detector
structure of the decoding graph. Specifically, Kruskal’s error models of bivariate bicycle codes. We also thank
algorithm is used to identify and eliminate variable nodes Pedro Crespo for his guidance and the other members
that introduce problematic cycles in the Tanner graph. of the Quantum Information Group at Tecnun for their
Similarly, the recently introduced BP plus guided dec- support.
imation (BP+GD) decoder iteratively modifies the de- This work was supported by the Spanish Ministry of
coding graph by excluding variable nodes with the least Economy and Competitiveness through the MADDIE
uncertainty in their soft-information [30]. An interest- project (Grant No. PID2022-137099NBC44), by the
ing direction for future research would be to explore the Spanish Ministry of Science and Innovation through the
combination of BP+OTF and BP+GD. project Few-qubit quantum hardware, algorithms and
Given its low complexity, BP+BP+OTF is a promising codes, on photonic and solid state systems (PLEC2021-
candidate for real-time decoding of syndromes from ex- 008251), and by the Ministry of Economic Affairs
perimental quantum computers. To this end, dedicated and Digital Transformation of the Spanish Government
hardware implementations of the algorithm using FPGAs through the QUANTUM ENIA project call - QUAN-
or ASICs will be necessary. Since the BP+BP+OTF al- TUM SPAIN project, and by the European Union
gorithm uses standard methods like the well-established through the Recovery, Transformation and Resilience
product-sum implementation of BP and Kruskal’s mini- Plan - NextGenerationEU within the framework of the
mum spanning tree algorithm, it may be possible to con- Digital Spain 2026 Agenda. J.R. is funded by the Engi-
struct such a decoder by combining existing commercially neering and Physical Sciences Research Council (Grant
available chips. Consequently, development costs could codes: EP/T001062/1 and EP/X026167/1).
[1] J. Roffe, “Quantum error correction: an introductory [5] S. Bravyi, A. W. Cross, J. M. Gambetta, D. Maslov,
guide,” Contemporary Physics, vol. 60, no. 3, pp. P. Rall, and T. J. Yoder, “High-threshold and low-
226–245, 2019. [Online]. Available: https://doi.org/10. overhead fault-tolerant quantum memory,” Nature, vol.
1080/00107514.2019.1667078 627, no. 8005, pp. 778–782, Mar 2024. [Online].
[2] Google Quantum AI, “Quantum error correction below Available: https://doi.org/10.1038/s41586-024-07107-7
the surface code threshold,” 2024. [Online]. Available: [6] Q. Xu, J. P. B. Ataides, C. A. Pattison, N. Raveendran,
https://arxiv.org/abs/2408.13687 D. Bluvstein, J. Wurtz, B. Vasic, M. D. Lukin, L. Jiang,
[3] S. Krinner, N. Lacroix, A. Remm, A. Di Paolo, and H. Zhou, “Constant-overhead fault-tolerant quantum
E. Genois, C. Leroux, C. Hellings, S. Lazar, F. Swiadek, computation with reconfigurable atom arrays,” 2023.
J. Herrmann, G. J. Norris, C. K. Andersen, M. Müller, [Online]. Available: https://arxiv.org/abs/2308.08648
A. Blais, C. Eichler, and A. Wallraff, “Realizing [7] T. R. Scruby, T. Hillmann, and J. Roffe, “High-
repeated quantum error correction in a distance- threshold, low-overhead and single-shot decodable fault-
three surface code,” Nature, vol. 605, no. 7911, tolerant quantum memory,” 2024. [Online]. Available:
p. 669–674, May 2022. [Online]. Available: http: https://arxiv.org/abs/2406.14445
//dx.doi.org/10.1038/s41586-022-04566-8 [8] T. Richardson and S. Kudekar, “Design of low-
[4] A. deMarti iOlius, P. Fuentes, R. Orús, P. M. Crespo, and density parity check codes for 5g new radio,” IEEE
J. Etxezarreta Martinez, “Decoding algorithms for sur- Communications Magazine, vol. 56, no. 3, pp. 28–34,
face codes,” arXiv, Jul. 2023, https://doi.org/10.48550/ Mar. 2018. [Online]. Available: https://ieeexplore.ieee.
arXiv.2307.14989. org/abstract/document/8316763
10
[9] D. J. MacKay and R. M. Neal, “Near shannon limit per- Structures and Algorithms in Python, 1st ed. Wiley,
formance of low density parity check codes,” Electronics 2013.
letters, vol. 33, no. 6, pp. 457–458, 1997. [25] O. Higgott, “Pymatching,” https://github.com/
[10] P. Fuentes, J. Etxezarreta Martinez, P. M. Crespo, oscarhiggott/PyMatching, accessed: 2024-07-26.
and J. Garcia-Frías, “Degeneracy and its impact on [26] N. Delfosse and N. H. Nickerson, “Almost-linear time
the decoding of sparse quantum codes,” IEEE Access, decoding algorithm for topological codes,” Quantum,
vol. 9, pp. 89 093–89 119, 2021. [Online]. Available: vol. 5, p. 595, Dec. 2021. [Online]. Available: https:
https://ieeexplore.ieee.org/abstract/document/9456887 //doi.org/10.22331/q-2021-12-02-595
[11] N. Raveendran and B. Vasić, “Trapping Sets of [27] O. Higgott, T. C. Bohdanowicz, A. Kubica, S. T.
Quantum LDPC Codes,” Quantum, vol. 5, p. 562, Oct. Flammia, and E. T. Campbell, “Improved decoding of
2021. [Online]. Available: https://doi.org/10.22331/ circuit noise and fragile boundaries of tailored surface
q-2021-10-14-562 codes,” Phys. Rev. X, vol. 13, p. 031007, Jul 2023.
[12] A. Grospellier, L. Grouès, A. Krishna, and A. Leverrier, [Online]. Available: https://link.aps.org/doi/10.1103/
“Combining hard and soft decoders for hypergraph PhysRevX.13.031007
product codes,” Quantum, vol. 5, p. 432, Apr. [28] Recall that the XOR logical operation is true if and only
2021. [Online]. Available: http://dx.doi.org/10.22331/ if the number of true inputs is odd. Furthermore, XOR
q-2021-04-15-432 and modulo-2 sum are equivalent for the binary field, i.e.
[13] P. Panteleev and G. Kalachev, “Degenerate quantum a parity constraint. We stick to the XOR operation so as
ldpc codes with good finite length performance,” to relate the update rule to the Piling-up lemma, which
Quantum, vol. 5, p. 585, Nov. 2021. [Online]. Available: is formulated by means of it [29].
http://dx.doi.org/10.22331/q-2021-11-22-585 [29] M. Matsui, “Linear cryptanalysis method for des
[14] J. Roffe, D. R. White, S. Burton, and E. Campbell, cipher,” in Advances in Cryptology — EUROCRYPT ’93,
“Decoding across the quantum low-density parity- T. Helleseth, Ed. Berlin, Heidelberg: Springer Berlin
check code landscape,” Phys. Rev. Res., vol. 2, Heidelberg, 1994, pp. 386–397. [Online]. Available: https:
p. 043423, Dec 2020. [Online]. Available: https: //link.springer.com/chapter/10.1007/3-540-48285-7_33
//link.aps.org/doi/10.1103/PhysRevResearch.2.043423 [30] H. Yao, W. A. Laban, C. Häger, A. G. i Amat, and
[15] T. R. Scruby and K. Nemoto, “Local probabilistic H. D. Pfister, “Belief propagation decoding of quantum
decoding of a quantum code,” Quantum, vol. 7, p. ldpc codes with guided decimation,” 2024. [Online].
1093, Aug. 2023. [Online]. Available: http://dx.doi.org/ Available: https://arxiv.org/abs/2312.10950
10.22331/q-2023-08-29-1093 [31] J. Valls, F. Garcia-Herrero, N. Raveendran, and
[16] S. Wolanski and B. Barber, “Ambiguity Clustering: an B. Vasić, “Syndrome-based min-sum vs osd-0 decoders:
accurate and efficient decoder for qLDPC codes,” arXiv, Fpga implementation and analysis for quantum ldpc
p. arXiv:2406.14527, Jun. 2024, https://arxiv.org/abs/ codes,” IEEE Access, vol. 9, pp. 138 734–138 743,
2406.14527. 2021. [Online]. Available: https://ieeexplore.ieee.org/
[17] A. deMarti iOlius and J. Etxezarreta Martinez, “The document/9562513
closed-branch decoder for quantum LDPC codes,” arXiv, [32] A. Gong, S. Cammerer, and J. M. Renes, “To-
p. arXiv:2402.01532, Feb. 2024, https://doi.org/10. ward Low-latency Iterative Decoding of QLDPC
48550/arXiv.2402.01532. Codes Under Circuit-Level Noise,” arXiv e-prints, p.
[18] T. Hillmann, L. Berent, A. O. Quintavalle, J. Eisert, arXiv:2403.18901, Mar. 2024, https://arxiv.org/abs/
R. Wille, and J. Roffe, “Localized statistics decoding: 2403.18901.
A parallel decoding algorithm for quantum low-density [33] Being strictly correct, the first round is special due to the
parity-check codes,” arXiv, Jun. 2024, https://doi.org/ fact that there is no propagation of previous rounds and,
10.48550/arXiv.2406.18655. thus, such first round should be processed independently
[19] J. B. Kruskal, “On the shortest spanning subtree of a [32].
graph and the traveling salesman problem,” Proceedings [34] J. Roffe, “LDPC: Python tools for low density
of the American Mathematical society, vol. 7, no. 1, pp. parity check codes,” 2022. [Online]. Available: https:
48–50, 1956. //pypi.org/project/ldpc/
[20] P.-J. H. S. Derks, A. Townsend-Teague, A. G. Bur-
chards, and J. Eisert, “Designing fault-tolerant circuits
using detector error models,” arXiv, Jul. 2024, https:
//doi.org/10.48550/arXiv.2407.13826. Appendix A: The Union-Find data structure for the
[21] Strictly, the number of columns n is not exactly the OTF decoder
number of possible circuit errors since errors that have
the same detector vectors are merged in single columns, In order to search an OTF in almost-linear time, we
adding their a priori probabilities. need to consider the union find data structure. Let us
[22] C. Gidney, “Stim: a fast stabilizer circuit simulator,” consider a dynamic list of a length equal to the number
Quantum, vol. 5, p. 497, Jul. 2021. [Online]. Available:
checks in the code to decode, where each element is a 2-
https://doi.org/10.22331/q-2021-07-06-497
[23] W. Fang and M. Ying, “Symbolic execution for element list, we shall name that list as DYNAMIC_LIST.
quantum error correction programs,” Proceedings of The first element of each duple will indicate the root of
the ACM on Programming Languages, vol. 8, no. the element and the second one will indicate its depth.
PLDI, p. 1040–1065, Jun. 2024. [Online]. Available: At the beginning, each element i will be its own root
http://dx.doi.org/10.1145/3656419 and its depth will be 1, i.e. DYNAMIC_LIST[i] = [i, 1].
[24] M. H. G. Michael T. Goodrich, Roberto Tamassia, Data Each element in the dynamic list will become a tree of a
11
to be the new root of the merged tree and its depth will 1
be increased by one.
After the first step, there will be elements in the DY- 4 5 2 [1, 2] 3 [1, 3]
Algorithm3 Decomposing detector error model est. Note that since we exhaustively search from a min-
detectors into “phenomenological” elements imum amount of columns to a bigger one, the obtained
decomposition is expected to be consisted of a minimum
INPUT: detector error model column hdem i
∈ Fd2 , amount of sparsified mechanisms.
“phenomenological” parity check matrix Algorithm 3 is then looped for all the columns of the
ns
1
Hsdem = (hsdem , · · · , hsdem ) ∈ Fd×n
2
s
, detector error model in question and a transfer matrix
logical observable detector error model matrix Odem ∈ Fk×n
2 , Atr is found, relating the detector error model and the
logical observable “phenomenological” matrix sparsified parity check matrix of interest. The exhaustive
Osdem ∈ Fk×n
2
s
search approach seems to be demanding due to the fact
OUTPUT: decomposition vector for the detector error that the combinatorial number cols r
increases very fast
model column, ai ∈ Fn s j
2
with j. However, for all the codes considered in this arti-
1: ▷ Get columns of Hsdem that at least cle, the number of elements in the decomposition has not
share a non-trivial element with the detector error model exceeded 3 components and, thus, it is relatively fast.
column, hdemi
. The map variable contains which elements Importantly, this is done in a pre-processing stage and
of Hsdem are retained so that the result for Hreduced can has not impact in the latency of the decoder. Note, also,
be mapped back. that detector error model matrices have a periodic struc-
2: Hreduced ← ∅ ture from round to round, as it can be seen in Figure 1
3: Oreduced ← ∅ of [32], implying that the processing can be done for a
4: map ← ∅ single propagation round and then extrapolate the rest of
5: for j ← 1 to ns do
the transfer matrix [33]. It is not within the scope of this
6: for k ← 1 to d do
7: i
if hdem (k) == hsdemj
(k) & hdem i
(k) == 1 then work to optimize this decomposition, but to show that
8: Hreduced ← Hreduced ∪ hsdem j doing this kind of reduction to a sparsified noise model
is beneficial to decode over circuit-level noise. Aiming at
9: Oreduced ← Oreduced ∪ ojsdem
10: map ← map ∪ {j} the best decomposition and finding better ways to search
11: break for those, e.g. using stabiliser simulators such as Stim, is
12: end if considered future work.
13: end for
14: end for
15: ▷ Exhaustive search over Appendix C: Numerical simulations
the reduced matrix Hreduced to decompose the detector
i
error model column, hdem . colsr refers to the number of
columns of Hreduced = (hreduced 1 colsr
, · · · , hreduced ). Vectors Monte Carlo computer simulations of the bivariate bi-
i m
edem , ereduced refer to the error vectors associated to the cycle codes have been performed with the objective of ob-
i and m columns of their respective parity check matrix. taining the performance curves (logical error rate). The
16: ai ← ∅ circuit-level noise simulations have been done the follow-
17: for j ← 1 to colsr do ing way. The sampling of the errors arising due to the
18: combs ← GetCombinations([1,
2, · · · , colsr ], j) noisy stabiliser circuit noise has been done by means
19: for k ← 1 to cols r
do of Stim [22]. Stim considers the check measurements
P j
20: dec ← m∈combs(k) hreduced m
upon a set of syndrome extractions altogether with a
edec ← m∈combs(k) em final measurement of the data qubits. We consider dc
P
21: reduced
22: lsdem ← Oreduced · edec syndrome extraction rounds, where dc is the distance of
23: ldem ← Odem · eidem the QEC. We also use the SlidingWindowDecoding [32]
24: if (dec == hdemi
& lsdem == ldem then Github repository for constructing the Stim syndrome
25: ai = ones(map(combs(k))) extraction circuits and obtain the circuit-level noise par-
26: break ity check and observable matrices of the bivariate bicycle
27: end if codes. The decoder uses those to resolve the syndrome
28: if ai ̸= ∅ then
and return an error, which is later compared to the logi-
29: break
30: end if
cal action of the Stim error.
31: end for The operational figure of merit we use to evaluate the
32: end for performance of these quantum error correction schemes
33: return ai is the Logical Error Rate per syndrome cycle (PL ), i.e.
the probability that a logical error has occurred after the
recovery operation per syndrome extraction round [5].
Regarding the software implementations of the de-
incide in its logical effect. Once such a decomposition coders used perform the numerical simulations have
is found, we generate the decomposition vector ai with been: the BP+OSD implementation in the LDPC Github
the method ones(map(combs(k))) which adds ones at lo- package [14, 34] (with slight modifications for handling
cations map(combs(k)), i.e. at the positions found but circuit-level noise [5]) and our implementation for the
mapped back from the Hreduced to the Hsdem of inter- proposed decoder can be found in the folowing Github
13
We consider the standard depolarizing (unbiased) • Idle gate (memory) locations: those are fol-
circuit-level noise model [4, 5, 27, 32] that consists of: lowed by a Pauli operator, {I, X, Y, Z}, sampled in-
dependently with probabilities pX = pY = pZ =
• Decoherence errors: data qubits are sub- p/3 and pI = 1 − p.
jected to depolarizing noise before syndrome ex-