\UseRawInputEncoding

Is Circuit Depth Accurate for Comparing Quantum Circuit Runtimes?

Matthew Tremba, Paul Hovland, and Ji Liu Mathematics and Computer Science Division
Argonne National Laboratory
Lemont, USA
{mtremba, hovland, ji.liu}@anl.gov
Abstract

Although quantum circuit depth is commonly used to approximate circuit runtimes, it overlooks a prevailing trait of current hardware implementation: different gates have different execution times. Recognizing the potential for discrepancies, we investigate depth’s accuracy for comparing runtimes between compiled versions of the same circuit. In particular, we assess the accuracy of traditional and multi-qubit depth for (1) predicting relative differences in runtime and (2) identifying compiled circuit version(s) with the shortest runtime. Finding that circuit depth is not accurate for either task, we introduce a new metric, gate-aware depth, that weights gates’ contributions to runtime using an architecture’s average gate execution times. Using average gate times allows gate-aware depth to capture variations by gate type without requiring exact knowledge of all gate times, increasing accuracy while maintaining portability across devices of the same architecture. Compared to traditional and multi-qubit depth, gate-aware depth reduces the average relative error of predictions in task (1) by 68 and 18 times and increases the average number of correct identifications in task (2) by 20 and 43 percentage points, respectively. Finally, we provide gate-aware depth weight configurations for current IBM Eagle and Heron architectures.

Index Terms:
quantum compilation, circuit depth, runtime

I Introduction

While quantum algorithms already show potential to outperform their classical counterparts, they continue to face significant challenges from hardware noise [1, 2, 3, 4]. In current devices, most noise originates from gate errors or decoherence, in which physical qubits “drift” over time to less predictable states [5]. As a result, the fidelity of computation depends primarily on a given circuit’s gate count and runtime. Obtaining accurate values for these characteristics is therefore critical for assessing the fidelity of circuits and, by extension, the performance of the quantum compilation algorithms which optimized them.

Two primary methods exist for measuring and comparing circuit runtimes. The first, circuit scheduling, operates at the hardware level and constructs an exact timeline and description of the physical process needed to implement the circuit, with the total duration of that process providing the circuit’s exact runtime on the device [6, 7]. The second method, circuit depth, operates at the circuit level and counts the minimum number of layers a circuit can be partitioned into, each of which can be interpreted as a sequential step during execution [8].

While the number of steps gives a loose proxy for the circuit’s runtime, their correlation may be inexact because of differences in gate execution times [9, 10, 11, 12]. For circuits of different sizes this effect may be negligible, but for circuits of similar sizes — and especially different compiled versions of the same circuit — they may lead to inaccurate runtime comparisons.

As a result, circuit scheduling and circuit depth fall on opposite ends of the accuracy-portability spectrum (see Fig. 1). On one hand, circuit scheduling provides perfectly accurate runtime comparisons, but depends on hardware parameters that may vary considerably between devices. On the other hand, circuit depth is completely independent of the target device, but may provide inaccurate runtime comparisons. This raises two questions:

  1. 1.

    Is circuit depth accurate for comparing runtimes, particularly between different compiled versions of the same circuit?

  2. 2.

    If not, is there an intermediate metric that provides higher accuracy while maintaining portability?

Refer to caption

Figure 1: Gate-aware depth, our proposed new metric, fills the gap in runtime estimation methods, achieving greater accuracy than circuit depth and greater portability than circuit scheduling.

In this paper, we address these questions by first proposing a new metric, gate-aware depth, to fill the intermediate role and then assessing the accuracies of both it and the existing depth metrics for circuit runtime comparison. Gate-aware depth weights gates’ contributions to circuit runtime using an architecture’s average gate execution times, taking advantage of device consistency within an architecture to provide reasonable accuracy across many devices simultaneously.

We then assess the accuracy of gate-aware, circuit, and multi-qubit depth for comparing runtimes between different compiled versions of the same circuit. In particular, we answer:

  1. 1.

    How accurately do relative differences in the metrics predict relative differences in runtime between compiled circuit versions?

  2. 2.

    How accurate are the metrics at identifying the compiled circuit version(s) with shortest runtime?

Our evaluation shows that circuit depth is not accurate for comparing compiled circuit versions, and that gate-aware depth is highly accurate while maintaining the cross-architecture portability that circuit scheduling lacks. Additionally, we provide suitable gate-aware weight configurations for existing IBM Eagle and Heron architectures.

II Background

II-A Quantum Circuit Optimization

To mitigate the hardware noise of NISQ-era devices, quantum programs are typically optimized by quantum compilers before they are physically run. A variety of compiler optimization and circuit mapping techniques [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34] have been proposed to reduce circuit gate count and depth, since these are indicators for the primary sources of noise [35].

In this context, circuit depth acts as an indicator for runtime, which is itself an approximation of the total decoherence experienced by the underlying quantum system [36]. While methods like circuit scheduling could provide exact runtime for use as an optimization objective, circuit depth is both simple and widely implemented in compiler frameworks such as Qiskit, TKET, and BQSKit [37, 38, 39].

Accordingly, researchers assess the runtime optimization capabilities of algorithms via the depths of the circuits they produce [40, 41]. By compiling circuits with multiple algorithms and comparing the relative depths between the different versions of each circuit, researchers can estimate the average-case runtime improvement produced by each algorithm.

II-B Circuit Schedulers

Once a quantum program has been optimized in the circuit model, a circuit scheduler compiles the program to the next lowest layer of abstraction by constructing a timeline of instructions that implement that circuit on the underlying quantum system [7, 42, 43]. For example, schedulers for IBM’s current devices convert hardware-compliant circuits to a collection of microwave pulses that are then applied to manipulate the superconducting qubits [6].

Besides its usual applications to tasks like gate calibration and error mitigation [6, 44], circuit scheduling can also be used to measure circuit runtime. Since the schedule specifies the exact physical process implementing the circuit, its total duration is precisely that circuit’s quantum runtime.

II-C Circuit Depth

Another option for describing circuit runtime is circuit depth, of which the two main varieties are depth and multi-qubit depth.

Depth, referred to hereafter as traditional depth, counts the minimum number of layers a circuit can be partitioned into, or equivalently the number of gates in the circuit’s longest path of logically dependent gates (the circuit’s critical path[8]. For example, the circuit in Fig. 2 has a traditional depth of 4 because the critical path, shown in solid blue, contains 4 gates. This path is critical because all other paths, shown in dotted orange and alternating green, have less than 4 gates.

Refer to caption

Figure 2: An example circuit with three possible paths of logically dependent gates, each of which is a critical path for a different metric. Gate-aware depth is configured with weights of 1 and 0.1 for two- and one-qubit gates, respectively.

In practice, traditional depth is calculated by updating each qubit’s depth while sweeping through the circuit and taking the maximum of the final depths. Each time a gate appears on qubits {qi,,qj}subscript𝑞𝑖subscript𝑞𝑗\{q_{i},\cdots,q_{j}\}{ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }, the tracker updates those qubits’ depths to

dnew=maxqk{qi,,qj}(dprev[qk])+1subscript𝑑newsubscriptsubscript𝑞𝑘subscript𝑞𝑖subscript𝑞𝑗subscript𝑑prevdelimited-[]subscript𝑞𝑘1d_{\text{new}}=\max_{q_{k}\in\{q_{i},\cdots,q_{j}\}}(d_{\text{prev}}[q_{k}])+1italic_d start_POSTSUBSCRIPT new end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ { italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } end_POSTSUBSCRIPT ( italic_d start_POSTSUBSCRIPT prev end_POSTSUBSCRIPT [ italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ) + 1 (1)

Multi-qubit depth, also called entangling depth or CNOT depth, works similarly, but only counts the multi-qubit gates along a given path [45, 46]. Algorithmically, this can be accomplished by toggling the increment value in (1) between 0 and 1 for single- or multi-qubit gates, respectively. The circuit in Fig. 2 has a multi-qubit depth of 2 because the most multi-qubit gates lying along a path is 2. Note that both the dotted orange and alternating green paths achieve this maximum.

II-D Gate Times

For most current quantum devices, individual gate execution times vary with several factors, including number of qubits [9, 10, 11], rotation gate angles [10], and the gate’s location on the device’s physical qubits [12].

While individual gate times vary, average gate times tend to be consistent across devices of similar design. For example, both IBM and IonQ devices implement RZ𝑅𝑍RZitalic_R italic_Z gates virtually via phase propagation, meaning these gates do not contribute to circuit runtime on any of these devices [47, 48].

II-E Motivation

When measuring and comparing circuit runtimes, a metric’s accuracy and portability depends on the granularity of gate times it considers. Circuit scheduling and circuit depth represent the two extremes of incorporating every individual gate time or none at all, which results in correspondingly extreme sacrifices in accuracy or portability. Accordingly, a new metric which uses average gate times has the potential to increase accuracy over depth while maintaining portability across the family of devices where those averages reasonably hold.

III Gate-Aware Depth

Algorithm 1 shows the pseudo-code for our proposed metric, gate-aware depth. Like traditional depth, gate-aware depth sweeps through the circuit while updating each qubit’s depth as gates pass. However, gate-aware depth replaces traditional depth’s constant increment with a weight map Warchsubscript𝑊archW_{\text{arch}}italic_W start_POSTSUBSCRIPT arch end_POSTSUBSCRIPT from the native gate set to weights.

Algorithm 1 Gate-Aware Depth
0:  quantum circuit C𝐶Citalic_C, architecture weight map Warchsubscript𝑊archW_{\text{arch}}italic_W start_POSTSUBSCRIPT arch end_POSTSUBSCRIPT
0:  gate-aware depth
1:  qubit_depths = [0 for qubit in C𝐶Citalic_C.qubits]
2:  for gate G𝐺Gitalic_G in C𝐶Citalic_C do
3:     qubits = G𝐺Gitalic_G.location
4:     new_depth = max(qubit_depths[qubits]) + Warch(G)subscript𝑊arch𝐺W_{\text{arch}}(G)italic_W start_POSTSUBSCRIPT arch end_POSTSUBSCRIPT ( italic_G )
5:     qubit_depths[qubits] = new_depth
6:  end for
7:  return  max(qubit_depths)

The weight map Warchsubscript𝑊archW_{\text{arch}}italic_W start_POSTSUBSCRIPT arch end_POSTSUBSCRIPT is configured using the target architecture’s average gate times. For an architecture with native gate set S𝑆Sitalic_S and gate GSsuperscript𝐺𝑆G^{*}\in Sitalic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_S, we define

Warch(G)=average gate time of GmaxGS(average gate time of G)subscript𝑊archsuperscript𝐺average gate time of superscript𝐺subscript𝐺𝑆average gate time of 𝐺W_{\text{arch}}(G^{*})=\frac{\text{average gate time of }G^{*}}{\max_{G\in S}(% \text{average gate time of }G)}italic_W start_POSTSUBSCRIPT arch end_POSTSUBSCRIPT ( italic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = divide start_ARG average gate time of italic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG roman_max start_POSTSUBSCRIPT italic_G ∈ italic_S end_POSTSUBSCRIPT ( average gate time of italic_G ) end_ARG (2)

Fig. 2 depicts the calculation of gate-aware depth for an example circuit. Under the given weight map, the alternating green path has the highest weighted sum of gates, which in turn defines the circuit’s gate-aware depth to be 2.1.

Gate-aware depth bears similarities to the use of cycles per instruction (CPI) for classical processor performance assessment, which incorporates both CPU instruction count and mixture [49].

IV Experimental Setup

Our experiment consists of three phases: compiling a standard circuit test suite for a target device using different algorithms, obtaining the compiled versions’ metric values and runtimes, and assessing each metric’s accuracy at 1) predicting relative differences in runtime and 2) identifying runtime-optimal circuit versions. We repeated this experiment for IBM’s two available superconducting architectures, the 127-qubit Eagle and 156-qubit Heron, using three devices for each architecture.

IV-A Circuit Test Suite

The circuit test suite consisted of 15 real quantum programs from 4 to 64 qubits that are commonly used for compiler benchmarking. We included 4-, 8-, and 16-qubit versions of the VQE [50] and QAOA [51] algorithms, as well as 4-, 8-, 16-, and 32-qubit Hamiltonian simulation circuits [52]. For scale, we additionally included QFT circuits of size 4, 8, 16, 32, and 64. The QAOA, VQE, and Hamiltonian simulation circuits were generated using SupermarQ [53], while the QFT circuits were generated using Qiskit [37].

IV-B Compilation

The test suite circuits were compiled using four different algorithms: SABRE [54], single-qubit gates matter (SQGM) [24], Qiskit’s default transpiler pass at optimization level 3 [37], and a custom pass in TKET [38]. Each circuit was compiled 5 times by each algorithm, and the optimized output with the lowest traditional depth was kept. Since SABRE and SQGM map circuits without rebasing for a device’s native gate set, all optimized circuits were subsequently translated for each device using Qiskit’s default transpiler pass at optimization level 0 (i.e. no unecessary optimization) to control for the effects of rebasing on depth.

IV-C Metric and Runtime Implementation

To obtain circuit depths, we implemented gate-aware depth in the BQSKit framework [39] and used it alongside the pre-existing functions for traditional and multi-qubit depth.

To obtain circuit runtime, we built a custom runtime estimator in BQSKit. The estimator uses Algorithm 1, replacing the average weight map Warchsubscript𝑊archW_{\text{arch}}italic_W start_POSTSUBSCRIPT arch end_POSTSUBSCRIPT with the exact runtimes of all individual gates on the device. With this modification the estimator produces the same runtime as a true circuit scheduler.

The gate times themselves were accessed through the IBM backends’ instruction_durations property, which specifies device gate times by both gate operation and qubit location. They are available through the Qiskit IBM runtime package [37].

We used the runtime estimator because pulse scheduling was deprecated in Qiskit 1.3, and is not publicly available on the newest Heron architecture. On Eagle devices where the estimator can be verified, its estimates match the pulse scheduler’s runtime almost exactly (see Section V-A).

IV-D Measures of Accuracy

IV-D1 Predicting Relative Runtime Differences

We measure a metric’s accuracy for predicting relative differences in runtime between compiled circuit versions using percent relative error (%RE). For circuits C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT which are different compiled versions of the same base circuit, we calculated the relative differences in the metric ΔDΔ𝐷\Delta Droman_Δ italic_D and runtime ΔRΔ𝑅\Delta Rroman_Δ italic_R using the equations

ΔD=[depth(C1)depth(C2)]/depth(C2)Δ𝐷delimited-[]depthsubscript𝐶1depthsubscript𝐶2depthsubscript𝐶2\Delta D=[\text{depth}(C_{1})-\text{depth}(C_{2})]\;/\;\text{depth}(C_{2})roman_Δ italic_D = [ depth ( italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - depth ( italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ] / depth ( italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) (3)
ΔR=[runtime(C1)runtime(C2)]/runtime(C2)Δ𝑅delimited-[]runtimesubscript𝐶1runtimesubscript𝐶2runtimesubscript𝐶2\Delta R=[\text{runtime}(C_{1})-\text{runtime}(C_{2})]\;/\;\text{runtime}(C_{2})roman_Δ italic_R = [ runtime ( italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - runtime ( italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ] / runtime ( italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) (4)

Researchers use ΔDΔ𝐷\Delta Droman_Δ italic_D to predict ΔRΔ𝑅\Delta Rroman_Δ italic_R, so we calculate the %RE of that prediction as

%RE=|ΔDΔR|/|ΔR|×100%REΔ𝐷Δ𝑅Δ𝑅100\text{\%RE}=|\Delta D-\Delta R|\;/\;|\Delta R|\times 100%RE = | roman_Δ italic_D - roman_Δ italic_R | / | roman_Δ italic_R | × 100 (5)

This yields the metric’s error in predicting the relative difference in runtime between a single pair of compiled circuit versions. By repeating this calculation for many pairs of compiled circuit versions, we create a distribution of errors that captures the metric’s overall performance.

IV-D2 Identifying Runtime-Optimal Circuit Versions

We measure a metric’s accuracy for identifying runtime-optimal circuit versions using the percentage of correct identifications. An identification is correct if, for a given metric, the sets of compiled versions with the minimum metric value and minimum runtime match. This provides the accuracy rate for a common use case in compiler comparison in which the algorithm which minimizes a circuit’s depth is presumed to have minimized its runtime too.

IV-E Platform

All tests were conducted with Python 3.13.2 on a 4-core AMD Ryzen 5 3500U with 5.66 GiB main memory running Manjaro 6.6.80. Because SQGM builds on SABRE as implemented in Qiskit version 0.33.0, we used version 0.46.3 of Qiskit Terra for compatibility when compiling with these algorithms. Otherwise, we used versions 0.5.38, 1.4.1, 0.36.1, 1.2.0, and 2.0.1 for the SupermarQ, Qiskit, Qiskit IBM runtime, BQSKit, and PyTKET packages, respectively.

V Results

All experimental source code and results are available at https://github.com/mtkgv/cdaa. Additionally, the circuit runtime estimator has been made available as a stand-alone tool at https://github.com/mtkgv/qcre.

V-A Runtime Estimator Verification

We verified the accuracy of our custom runtime estimator by comparing it against the Qiskit pulse scheduler. Since only Eagle backends support pulse scheduling, we compiled the 15 test circuits for the IBM Sherbrooke, an Eagle-model device, using SABRE, SQGM, and Qiskit. We then compared our estimated runtime against the pulse schedule’s total duration.

For 30 of the 45 compiled circuits, the difference between the estimated runtime and pulse schedule duration was exactly 0 seconds. For the remaining 15 circuits, the largest difference was 3.5×10163.5superscript10163.5\times 10^{-16}3.5 × 10 start_POSTSUPERSCRIPT - 16 end_POSTSUPERSCRIPT seconds for a circuit with a true runtime of 1.8×1031.8superscript1031.8\times 10^{-3}1.8 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT seconds, which is explainable by floating-point errors. We therefore conclude that our estimated runtime matches true runtime extremely accurately.

V-B Architecture Weight Maps

We configured the gate-aware depth weight maps for our two target architectures, the IBM Eagle and Heron, by taking average gate times over 3 devices each. For the Eagle, we took average gate times over the Sherbrooke, Kyiv, and Brisbane devices; and for the Heron, we took average gate times over the Marrakesh, Kingston, and Aachen devices. The resulting weight maps are shown in Table I.

TABLE I: Gate-Aware Depth Weight Maps for IBM Eagle and Heron Architectures
G𝐺Gitalic_G ECR𝐸𝐶𝑅ECRitalic_E italic_C italic_R CZ𝐶𝑍CZitalic_C italic_Z RZ𝑅𝑍RZitalic_R italic_Z SX𝑆𝑋SXitalic_S italic_X X𝑋Xitalic_X
WEagle(G)subscript𝑊Eagle𝐺W_{\text{Eagle}}(G)italic_W start_POSTSUBSCRIPT Eagle end_POSTSUBSCRIPT ( italic_G ) 1.0001.0001.0001.000 n/a 0.0000.0000.0000.000 0.09420.09420.09420.0942 0.09420.09420.09420.0942
WHeron(G)subscript𝑊Heron𝐺W_{\text{Heron}}(G)italic_W start_POSTSUBSCRIPT Heron end_POSTSUBSCRIPT ( italic_G ) n/a 1.0001.0001.0001.000 0.0000.0000.0000.000 0.4830.4830.4830.483 0.4830.4830.4830.483

V-C Case Study: QFT4, SQGM v. Qiskit

Before comparing the metrics’ overall accuracies, we first present a single-comparison case study using the 4-qubit QFT circuits compiled by SQGM and Qiskit for the IBM Marrakesh. Fig. 3 illustrates the proportional changes ΔDΔ𝐷\Delta Droman_Δ italic_D and ΔRΔ𝑅\Delta Rroman_Δ italic_R between these two compiled versions.

Refer to caption

Figure 3: Proportional change in metrics and runtime of the Qiskit-compiled QFT4 relative to the SQGM-compiled QFT4. The true proportional change in runtime is indicated by the dotted green line.

This comparison illustrates a “worst-case” scenario in which traditional and multi-qubit depth decreased from the SQGM to Qiskit versions (indicated by negative ΔDΔ𝐷\Delta Droman_Δ italic_D values) while runtime increased (indicated by the positive ΔRΔ𝑅\Delta Rroman_Δ italic_R value). In comparison, gate-aware depth increased, making it the only metric to correctly predict the increase in runtime.

The %RE outlined in Section IV-D captures these observations numerically; %RE is higher when ΔDΔ𝐷\Delta Droman_Δ italic_D and ΔRΔ𝑅\Delta Rroman_Δ italic_R are far apart, and in particular is at least 100%percent100100\%100 % when their signs differ. When comparing this pair of circuit versions, traditional, multi-qubit, and gate-aware depth have %REs of 794%percent794794\%794 %, 199%percent199199\%199 %, and 11.7%percent11.711.7\%11.7 %, respectively.

V-D Accuracy for Predicting Relative Differences in Runtime

Next, we compare the metrics’ overall accuracies for predicting relative differences in runtime by obtaining a distribution of the %REs for all possible pairwise comparisons. With 15 circuits and 4 compilers, there were (42)×15=90binomial421590\binom{4}{2}\times 15=90( FRACOP start_ARG 4 end_ARG start_ARG 2 end_ARG ) × 15 = 90 possible pairwise comparisons, and thus n=90𝑛90n=90italic_n = 90 data points, for each metric on every device. 111For Eagle devices, our custom TKET compilation pass produced circuits that matched the coupling graph and gate set but occasionally reversed the qubit direction of available ECR𝐸𝐶𝑅ECRitalic_E italic_C italic_R gates. The optimization level 0 Qiskit translation pass raised errors when encountering these reversed ECR𝐸𝐶𝑅ECRitalic_E italic_C italic_Rs, preventing us from continuing our experimental procedure. For this reason, we excluded TKET-compiled circuits from our Eagle analyses, giving only (32)×15=45binomial321545\binom{3}{2}\times 15=45( FRACOP start_ARG 3 end_ARG start_ARG 2 end_ARG ) × 15 = 45 pairwise comparisons for these devices. However, the trends observed remained similar to Heron devices, where all four compilers were included.

Fig. 4 shows the %RE distributions by metric. On all devices, gate-aware depth produced the lowest median %RE, followed by multi-qubit depth with the second-lowest and traditional depth with the highest. On average across all devices, gate-aware depth reduced the median %RE by 64 and 18 times relative to traditional and multi-qubit depth, respectively.

Refer to caption

Figure 4: Distribution of %REs for predictions of relative runtime changes.

For traditional depth, the third quartile exceeds 100% RE on all devices. This means that, for at least one in four runtime predictions, the runtime change predicted by traditional depth differs from the true change by more than size of the true change itself. Multi-qubit depth performed only slightly better, with an average third quartile of 77.9% RE. In comparison, gate-aware depth’s third quartile was below 10% RE for five out of six devices, only excluding the IBM Aachen. For those five devices, this means that at least three out of four predicted runtime differences deviated from the true runtime difference by less than 10%. Only gate-aware depth’s outliers exceeded the 100% RE threshold regularly crossed by traditional depth.

Gate-aware depth reduced the median %RE and %RE interquartile range less for the IBM Aachen, which occurs because this device has greater variability in gate execution times. The variation makes runtimes more sensitive to the circuit’s placement on the physical hardware, which gate-aware depth does not account for.

V-E Accuracy for Identifying Runtime-Optimal Circuit Versions

To compare the metrics’ accuracies for identifying runtime-optimal circuit versions, we made 15 identifications using each metric on every device, one for each base circuit in the test suite.

Fig. 5 shows the percentage of correct identifications by metric. Gate-aware depth made the most correct identifications for all devices, followed by traditional depth with the second-most and multi-qubit depth with the fewest (or tied for second). On average, gate-aware depth increased the number of correct comparisons by 20 and 43 percentage points over traditional and multi-qubit depth, respectively. In five out of six devices, gate-aware depth achieved a perfect 100% accuracy rate, which no other metric achieved.

Refer to caption

Figure 5: Percentage of circuits with shortest-runtime version correctly identified.

Despite having the second-highest accuracy in predicting the relative difference in runtime (see Section V-D), multi-qubit depth had the lowest accuracy rate for identifying runtime-optimal circuit versions. A closer look at the Marrakesh identifications revealed that, for 8 out of the 10 incorrect identifications, the runtime-optimal compiled version tied with other versions for the lowest depth, which counts as incorrect. Ties are more likely for multi-qubit depth than the other metrics because it discards all information about single-qubit gates, eliminating a potential source of differentiation.

VI Discussion

VI-A Weight Selection

We verified that the weight selection method given by (2) chooses good values by varying the weight maps manually and checking the resulting accuracy for runtime predictions. Since the main difference between the weight maps in Table I are the non-RZ𝑅𝑍RZitalic_R italic_Z single-qubit gate weights, we parameterized them by wssubscript𝑤𝑠w_{s}italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and tested values from 00 to 1111 in increments of 0.010.010.010.01. The resulting weight maps are given in Table II.

TABLE II: Manual Verification Weight Maps
G𝐺Gitalic_G ECR𝐸𝐶𝑅ECRitalic_E italic_C italic_R CZ𝐶𝑍CZitalic_C italic_Z RZ𝑅𝑍RZitalic_R italic_Z SX𝑆𝑋SXitalic_S italic_X X𝑋Xitalic_X
WEagle(G)subscriptsuperscript𝑊Eagle𝐺W^{\prime}_{\text{Eagle}}(G)italic_W start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Eagle end_POSTSUBSCRIPT ( italic_G ) 1.0001.0001.0001.000 n/a 0.0000.0000.0000.000 wssubscript𝑤𝑠w_{s}italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT wssubscript𝑤𝑠w_{s}italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT
WHeron(G)subscriptsuperscript𝑊Heron𝐺W^{\prime}_{\text{Heron}}(G)italic_W start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Heron end_POSTSUBSCRIPT ( italic_G ) n/a 1.0001.0001.0001.000 0.0000.0000.0000.000 wssubscript𝑤𝑠w_{s}italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT wssubscript𝑤𝑠w_{s}italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT

Fig. 6 plots gate-aware depth’s median %RE for predicting relative differences in runtime against the weight wssubscript𝑤𝑠w_{s}italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. As expected, the values of wssubscript𝑤𝑠w_{s}italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT producing the minimum median %RE for a given device cluster by architecture, with Eagle and Heron devices having optimal weights around 0.1 and 0.5, respectively. The true weights calculated in Section V-B, shown by the thick vertical grey lines, fall in the middle of the cluster corresponding to their architecture. This shows that the weights given by (2) accurately capture gate time characteristics that are shared by devices of the same architecture.

Refer to caption

Figure 6: Gate-aware depth’s median %RE for different values of non-RZ𝑅𝑍RZitalic_R italic_Z single-qubit gate weights wssubscript𝑤𝑠w_{s}italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. The thin vertical lines show the value of wssubscript𝑤𝑠w_{s}italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT that minimize median %RE for a given device, and the thick ones show the real values calculated for the Eagle and Heron architectures in Section V-B.

VI-B Application to Compiler Comparison

Researchers often compare compiler algorithms by compiling the same suite of test circuits using each algorithm and comparing the depths of the corresponding circuits, with greater relative reductions in depth indicating greater optimization ability. However, our results demonstrate that the two most commonly-used metrics, traditional and multi-qubit depth, are relatively inaccurate proxies for the true objective they aim to represent, runtime. Consequently, they provide inaccurate assessments of the relative performance of compilers. In comparison, gate-aware depth provides greater accuracy while still maintaining portability across devices of the same architecture.

If inter-architecture portability is required, our results show that traditional depth is better able to identify runtime-optimal circuit versions because it accounts for all gates, while multi-qubit depth better predicts relative differences in runtime because two-qubit gates dominate circuit execution times in today’s devices.

VI-C Non-Superconducting Devices

Although gate-aware depth is highly successful for the superconducting quantum devices tested, every qubit technology introduces changes in the process of scheduling and running quantum circuits that may affect its performance. For example, trapped-ion devices require additional shuttling time to shift ions around the trap, an operation which the circuit model fails to capture [55, 56]. We limited our experiment to IBM’s superconducting devices because they offered direct access to circuit scheduling and runtime, but future works could extend the approach to other devices and technologies.

VII Conclusion

In this paper, we show that circuit depth is inaccurate for comparing quantum circuit runtimes and propose a new metric that increases accuracy while maintaining hardware-agnosticism for devices of the same architecture. To do so, we identified variation in gate execution times as an underlying cause of the accuracy-portability tradeoff, and, in response, designed our metric to use average gate times for a given architecture. This approach achieves a middle-ground between the high portability of circuit depth and the high accuracy of circuit scheduling, thereby filling a gap in existing runtime comparison methods. We discuss the application of these findings to quantum compilation, and finally provide weight configurations for use with the current IBM Eagle and Heron architectures.

Acknowledgments

This material is based upon work supported by the DOE-SC Office of Advanced Scientific Computing Research MACH-Q project under contract number DE-AC02-06CH11357. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.

References

  • [1] P. W. Shor, “Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer,” SIAM Journal on Computing, vol. 26, no. 5, pp. 1484–1509, 1997.
  • [2] L. K. Grover, “A fast quantum mechanical algorithm for database search,” in Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, STOC ’96, (New York, NY, USA), p. 212–219, Association for Computing Machinery, 1996.
  • [3] B. Khanal and P. Rivas, “Evaluating the impact of noise on variational quantum circuits in nisq era devices,” in 2023 Congress in Computer Science, Computer Engineering, Applied Computing (CSCE), pp. 1658–1664, 2023.
  • [4] T. D. Ladd, F. Jelezko, R. Laflamme, Y. Nakamura, C. Monroe, and J. L. O’Brien, “Quantum computers,” Nature, vol. 464, pp. 45–53, Mar. 2010.
  • [5] A. A. Saki, M. Alam, and S. Ghosh, “Impact of noise on the resilience and the security of quantum computing,” in 2021 22nd International Symposium on Quality Electronic Design (ISQED), pp. 186–191, 2021.
  • [6] T. Alexander, N. Kanazawa, D. J. Egger, L. Capelluto, C. J. Wood, A. Javadi-Abhari, and D. C McKay, “Qiskit pulse: programming quantum computers through the cloud with pulses,” Quantum Science and Technology, vol. 5, p. 044006, aug 2020.
  • [7] T. Nguyen and A. McCaskey, “Enabling pulse-level programming, compilation, and execution in xacc,” IEEE Transactions on Computers, vol. 71, no. 3, pp. 547–558, 2022.
  • [8] M. Nielsen and I. Chuang, Quantum Computation and Quantum Information. Cambridge University Press, 2010.
  • [9] J.-S. Chen, E. Nielsen, M. Ebert, V. Inlek, K. Wright, V. Chaplin, A. Maksymov, E. Páez, A. Poudel, P. Maunz, and J. Gamble, “Benchmarking a trapped-ion quantum computer with 30 qubits,” Quantum, vol. 8, p. 1516, Nov. 2024.
  • [10] R. A. et al., “Quantum error correction below the surface code threshold,” Nature, vol. 638, p. 920–926, Dec. 2024.
  • [11] J. M. Pino, J. M. Dreiling, C. Figgatt, J. P. Gaebler, S. A. Moses, M. S. Allman, C. H. Baldwin, M. Foss-Feig, D. Hayes, K. Mayer, C. Ryan-Anderson, and B. Neyenhuis, “Demonstration of the trapped-ion quantum ccd computer architecture,” Nature, vol. 592, p. 209–213, Apr. 2021.
  • [12] IBM, “Quantum processing units.” https://quantum.ibm.com/services/resources (visited on Apr. 13, 2025).
  • [13] P. Li, J. Liu, A. Gonzales, Z. H. Saleem, H. Zhou, and P. Hovland, “Qutracer: Mitigating quantum gate and measurement errors by tracing subsets of qubits,” in 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), pp. 103–117, IEEE, 2024.
  • [14] J. Liu, A. Gonzales, B. Huang, Z. H. Saleem, and P. Hovland, “Quclear: Clifford extraction and absorption for quantum circuit optimization,” in 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 158–172, IEEE, 2025.
  • [15] Z. Liang, Z. Song, J. Cheng, Z. He, J. Liu, H. Wang, R. Qin, Y. Wang, S. Han, X. Qian, et al., “Hybrid gate-pulse model for variational quantum algorithms,” in 2023 60th ACM/IEEE Design Automation Conference (DAC), pp. 1–6, IEEE, 2023.
  • [16] C. Campbell, F. T. Chong, D. Dahl, P. Frederick, P. Goiporia, P. Gokhale, B. Hall, S. Issa, E. Jones, S. Lee, et al., “Superstaq: Deep optimization of quantum programs,” in 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), vol. 1, pp. 1020–1032, IEEE, 2023.
  • [17] J. Liu, M. Bowman, P. Gokhale, S. Dangwal, J. Larson, F. T. Chong, and P. D. Hovland, “Qcontext: Context-aware decomposition for quantum gates,” in 2023 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5, IEEE, 2023.
  • [18] J. Liu, L. Bello, and H. Zhou, “Relaxed peephole optimization: A novel compiler optimization for quantum circuits,” in 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 301–314, IEEE, 2021.
  • [19] Y. Jin, Z. Li, F. Hua, T. Hao, H. Zhou, Y. Huang, and E. Z. Zhang, “Tetris: A compilation framework for vqa applications in quantum computing,” in 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), pp. 277–292, IEEE, 2024.
  • [20] E. Younis, K. Sen, K. Yelick, and C. Iancu, “Qfast: Conflating search and numerical optimization for scalable quantum circuit synthesis,” in 2021 IEEE International Conference on Quantum Computing and Engineering (QCE), pp. 232–243, IEEE, 2021.
  • [21] N. Nottingham, M. A. Perlin, R. White, H. Bernien, F. T. Chong, and J. M. Baker, “Decomposing and routing quantum circuits under constraints for neutral atom architectures,” arXiv e-prints, pp. arXiv–2307, 2023.
  • [22] A. Xu, A. Molavi, S. Tannu, and A. Albarghouthi, “Optimizing quantum circuits, fast and slow,” in Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1, pp. 777–793, 2025.
  • [23] M. Xu, Z. Li, O. Padon, S. Lin, J. Pointing, A. Hirth, H. Ma, J. Palsberg, A. Aiken, U. A. Acar, et al., “Quartz: superoptimization of quantum circuits,” in Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, pp. 625–640, 2022.
  • [24] S. Li, K. D. Nguyen, Z. Clare, and Y. Feng, “Single-qubit gates matter for optimising quantum circuit depth in qubit mapping,” in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), pp. 1–9, 2023.
  • [25] T. Fösel, M. Y. Niu, F. Marquardt, and L. Li, “Quantum circuit optimization with deep reinforcement learning,” arXiv preprint arXiv:2103.07585, 2021.
  • [26] F. J. Ruiz, T. Laakkonen, J. Bausch, M. Balog, M. Barekatain, F. J. Heras, A. Novikov, N. Fitzpatrick, B. Romera-Paredes, J. van de Wetering, et al., “Quantum circuit optimization with alphatensor,” Nature Machine Intelligence, pp. 1–12, 2025.
  • [27] Y. Nam, N. J. Ross, Y. Su, A. M. Childs, and D. Maslov, “Automated optimization of large quantum circuits with continuous parameters,” npj Quantum Information, vol. 4, no. 1, p. 23, 2018.
  • [28] X. Cao, J. Zhou, Y. Liu, Y. Shi, and G. Li, “Marqsim: Reconciling determinism and randomness in compiler optimization for quantum simulation,” arXiv preprint arXiv:2408.03429, 2024.
  • [29] K. Hietala, R. Rand, S.-H. Hung, X. Wu, and M. Hicks, “A verified optimizer for quantum circuits,” Proceedings of the ACM on Programming Languages, vol. 5, no. POPL, pp. 1–29, 2021.
  • [30] J. Paykin, A. T. Schmitz, M. Ibrahim, X.-C. Wu, and A. Y. Matsuura, “Pcoast: A pauli-based quantum circuit optimization framework,” in 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), vol. 1, pp. 715–726, IEEE, 2023.
  • [31] J. Liu, E. Younis, M. Weiden, P. Hovland, J. Kubiatowicz, and C. Iancu, “Tackling the qubit mapping problem with permutation-aware synthesis,” in 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), vol. 1, pp. 745–756, IEEE, 2023.
  • [32] J. Liu, P. Li, and H. Zhou, “Not all swaps have the same cost: A case for optimization-aware qubit routing,” in 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 709–725, IEEE, 2022.
  • [33] W. Tang, Y. Duan, Y. Kharkov, R. Fakoor, E. Kessler, and Y. Shi, “Alpharouter: Quantum circuit routing with reinforcement learning and tree search,” in 2024 IEEE International Conference on Quantum Computing and Engineering (QCE), vol. 1, pp. 930–940, IEEE, 2024.
  • [34] A. Molavi, A. Xu, M. Diges, L. Pick, S. Tannu, and A. Albarghouthi, “Qubit mapping and routing via maxsat,” in 2022 55th IEEE/ACM international symposium on Microarchitecture (MICRO), pp. 1078–1091, IEEE, 2022.
  • [35] T. G. d. Brugière, M. Baboulin, B. Valiron, S. Martiel, and C. Allouche, “Reducing the depth of linear reversible quantum circuits,” IEEE Transactions on Quantum Engineering, vol. 2, pp. 1–22, 2021.
  • [36] S. Ganjam, Y. Wang, Y. Lu, A. Banerjee, C. U. Lei, L. Krayzman, K. Kisslinger, C. Zhou, R. Li, Y. Jia, M. Liu, L. Frunzio, and R. J. Schoelkopf, “Surpassing millisecond coherence in on chip superconducting quantum memories by optimizing materials and circuit design,” Nature Communications, vol. 15, p. 3687, May 2024.
  • [37] A. Javadi-Abhari, M. Treinish, K. Krsulich, C. J. Wood, J. Lishman, J. Gacon, S. Martiel, P. D. Nation, L. S. Bishop, A. W. Cross, B. R. Johnson, and J. M. Gambetta, “Quantum computing with Qiskit,” 2024.
  • [38] S. Sivarajah, S. Dilkes, A. Cowtan, W. Simmons, A. Edgington, and R. Duncan, “t—ket⟩: a retargetable compiler for nisq devices,” Quantum Science and Technology, vol. 6, p. 014003, nov 2020.
  • [39] E. Younis, C. C. Iancu, W. Lavrijsen, M. Davis, E. Smith, and USDOE, “Berkeley quantum synthesis toolkit (bqskit) v1,” 04 2021.
  • [40] P. D. Nation, A. A. Saki, S. Brandhofer, L. Bello, S. Garion, M. Treinish, and A. Javadi-Abhari, “Benchmarking the performance of quantum computing software,” 2025.
  • [41] H. Zou, M. Treinish, K. Hartman, A. Ivrii, and J. Lishman, “Lightsabre: A lightweight and enhanced sabre algorithm,” 2024.
  • [42] H. Silvério, S. Grijalva, C. Dalyac, L. Leclerc, P. J. Karalekas, N. Shammah, M. Beji, L.-P. Henry, and L. Henriet, “Pulser: An open-source package for the design of pulse sequences in programmable neutral-atom arrays,” Quantum, vol. 6, p. 629, Jan. 2022.
  • [43] Rigetti Computing, “Quil-t.” https://pyquil-docs.rigetti.com/en/stable/quilt.html (visited on Apr. 12, 2025).
  • [44] M. Werninghaus, D. J. Egger, F. Roy, S. Machnes, F. K. Wilhelm, and S. Filipp, “Leakage reduction in fast superconducting qubit gates via optimal control,” npj Quantum Information, vol. 7, p. 14, Jan 2021.
  • [45] M. Remaud and V. Vandaele, “Ancilla-free quantum adder with sublinear depth,” 2025.
  • [46] T. G. de Brugière and S. Martiel, “Faster and shorter synthesis of hamiltonian simulation circuits,” 2024.
  • [47] IBM, “Rzgate.” https://docs.quantum.ibm.com/api/qiskit/qiskit.circuit.library.RZGate (visited on Apr. 12, 2025).
  • [48] IonQ, “Native gates.” https://docs.ionq.com/guides/getting-started-with-native-gates (visited on Apr. 12, 2025).
  • [49] J. L. Hennessy and D. A. Patterson, Computer Architecture, ch. Fundamentals of Quantitative Design and Analysis, pp. 49–50. The Morgan Kaufmann Series in Computer Architecture and Design, Oxford, England: Morgan Kaufmann, 5 ed., Sept. 2011.
  • [50] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’brien, “A variational eigenvalue solver on a photonic quantum processor,” Nature communications, vol. 5, no. 1, p. 4213, 2014.
  • [51] E. Farhi, J. Goldstone, and S. Gutmann, “A quantum approximate optimization algorithm,” arXiv preprint arXiv:1411.4028, 2014.
  • [52] L. Bassman Oftelie, K. Liu, A. Krishnamoorthy, T. Linker, Y. Geng, D. Shebib, S. Fukushima, F. Shimojo, R. K. Kalia, A. Nakano, et al., “Towards simulation of the dynamics of materials on quantum computers,” Physical Review B, vol. 101, no. 18, p. 184305, 2020.
  • [53] T. Tomesh, P. Gokhale, V. Omole, G. S. Ravi, K. N. Smith, J. Viszlai, X.-C. Wu, N. Hardavellas, M. R. Martonosi, and F. T. Chong, “Supermarq: A scalable quantum benchmark suite,” in 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 587–603, 2022.
  • [54] G. Li, Y. Ding, and Y. Xie, “Tackling the qubit mapping problem for nisq-era quantum devices,” in Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’19, (New York, NY, USA), p. 1001–1014, Association for Computing Machinery, 2019.
  • [55] D. Schoenberger, S. Hillmich, M. Brandl, and R. Wille, “Shuttling for scalable trapped-ion quantum computers,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp. 1–1, 2024.
  • [56] B. Bach, I. Safro, and E. Younis, “Efficient compilation for shuttling trapped-ion machines via the position graph architectural abstraction,” arXiv preprint arXiv:2501.12470, 2025.

The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne”). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan http://energy.gov/downloads/doe-public-access-plan.