Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 11065 publications
    CrossCheck: Input Validation for WAN Control Systems
    Rishabh Iyer
    Isaac Keslassy
    Sylvia Ratnasamy
    Networked Systems Design and Implementation (NSDI) (2026) (to appear)
    Preview abstract We present CrossCheck, a system that validates inputs to the Software-Defined Networking (SDN) controller in a Wide Area Network (WAN). By detecting incorrect inputs—often stemming from bugs in the SDN control infrastructure—CrossCheck alerts operators before they trigger network outages. Our analysis at a large-scale WAN operator identifies invalid inputs as a leading cause of major outages, and we show how CrossCheck would have prevented those incidents. We deployed CrossCheck as a shadow validation system for four weeks in a production WAN, during which it accurately detected the single incident of invalid inputs that occurred while sustaining a 0% false positive rate under normal operation, hence imposing little additional burden on operators. In addition, we show through simulation that CrossCheck reliably detects a wide range of invalid inputs (e.g., detecting demand perturbations as small as 5% with 100% accuracy) and maintains a near-zero false positive rate for realistic levels of noisy, missing, or buggy telemetry data (e.g., sustaining zero false positives with up to 30% of corrupted telemetry data). View details
    FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration
    Diganta Misra
    Yanqi Luo
    Anjali Sridhar
    Justine Gehring
    Silvio Soares Ribeiro Junior
    2026
    Preview abstract AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization. View details
    Preview abstract How many T gates are needed to approximate an arbitrary n-qubit quantum state to within a given precision ϵ? Improving prior work of Low, Kliuchnikov and Schaeffer, we show that the optimal asymptotic scaling is Θ(sqrt{2^n log(1/ε)} + log(1/ε)) if we allow an unlimited number of ancilla qubits. We also show that this is the optimal T-count for implementing an arbitrary diagonal n-qubit unitary to within error ϵ. We describe an application to batched synthesis of single-qubit unitaries: we can approximate a tensor product of m = O(log log(1/ϵ)) arbitrary single-qubit unitaries to within error ϵ with the same asymptotic T-count as is required to approximate just one single-qubit unitary. View details
    Preview abstract Semantic data models express high-level business concepts and metrics, capturing the business logic needed to query a database correctly. Most data modeling solutions are built as layers above SQL query engines, with bespoke query languages or APIs. The layered approach means that semantic models can’t be used directly in SQL queries. This paper focuses on an open problem in this space – can we define semantic models in SQL, and make them naturally queryable in SQL? In parallel, graph query is becoming increasingly popular, including in SQL. SQL/PGQ extends SQL with an embedded subset of the GQL graph query language, adding property graph views and making graph traversal queries easy. We explore a surprising connection: semantic data models are graphs, and defining graphs is a data modeling problem. In both domains, users start by defining a graph model, and need query language support to easily traverse edges in the graph, which means doing joins in the underlying data. We propose some useful SQL extensions that make it easier to use higher-level data model abstractions in queries. Users can define a “semantic data graph” view of their data, encapsulating the complex business logic required to query the underlying tables correctly. Then they can query that semantic graph model easily with SQL. Our SQL extensions are useful independently, simplifying many queries – particularly, queries with joins. We make declared foreign key relationships usable for joins at query time – a feature that seems obvious but is notably missing in standard SQL. In combination, these extensions provide a practical approach to extend SQL incrementally, bringing semantic modeling and graph query together with the relational model and SQL. View details
    Productionizing Quantum Mass Production
    Bill Huggins
    Nathan Wiebe
    arXiv for now (2026) (to appear)
    Preview abstract For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step. View details
    LeakyFeeder: In-Air Gesture Control Through Leaky Acoustic Waves
    Yongjie Yang
    Tao Chen
    Zhenlin An
    Shirui Cao
    Shangguan Longfei
    SenSys 2025 - The 23rd ACM Conference on Embedded Networked Sensor Systems (2025)
    Preview abstract We present LeekyFeeder, a mobile application that explores the acoustic signals leaked from headphones to reconstruct gesture motions around the ear for fine-grained gesture control. To achieve this goal, LeekyFeeder reuses the speaker and feed-forward microphones on active noise cancellation (ANC) headphones as a SONAR system, emitting an inaudible frequency-modulated continuous-wave (FMCW) signal to track gesture reflections over time. Since this single-receiver SONAR system is unable to differentiate reflection angles and further disentangle signal reflections from different gesture parts, we draw on principles of multi-modal learning to frame gesture motion reconstruction as a multi-modal translation task and propose a deep learning-based approach to fill the information gap between low-dimensional FMCW ranging readings and high-dimensional 3D hand movements. We implement LeekyFeeder on a pair of Google Pixel Buds and conduct experiments to examine the efficacy and robustness of LeekyFeeder in various conditions. Experiments based on six gesture types inspired by Apple Vision Pro demonstrate that LeekyFeeder achieves a PCK performance of 89% at 3cm across ten users, with an average MPJPE and MPJRPE error of 2.71cm and 1.88cm, respectively. View details
    Preview abstract Measuring software development can help drive impactful change. However, it’s a complex task, and getting started can be daunting as it involves understanding what you should measure, and determining what you can measure. This article provides a guide to selecting a framework that aligns with organizational measurement strategy. View details
    Preview abstract Most contextual bandit algorithms assume that rewards are bounded uniformly, or with a high probability for each context and action. Furthermore, the theoretical regret, as well as empirical performance of most prevailing methods scales polynomially with this reward scale. In this work, we use ideas from robust mean estimation to design contextual bandit algorithms where the regret has only a mild logarithmic scaling on the reward scale, with a polynomial dependence on the second moment rather than the maximum reward value. Our algorithm is based on Catoni's mean estimator, is applicable to arbitrary non-linear function classes, and we present both variance aware and variance agnostic versions of our approach. View details
    Preview abstract Natural disasters, including earthquakes, wildfires and cyclones, bear a huge risk on human lives as well as infrastructure assets. An effective response to disaster depends on the ability to rapidly and efficiently assess the intensity of damage. Artificial Intelligence (AI) and Generative Artificial Intelligence (GenAI) presents a breakthrough solution, capable of combining knowledge from multiple types and sources of data, simulating realistic scenarios of disaster, and identifying emerging trends at a speed previously unimaginable. In this paper, we present a comprehensive review on the prospects of AI and GenAI in damage assessment for various natural disasters, highlighting both its strengths and limitations. We talk about its application to multimodal data such as text, image, video, and audio, and also cover major issues of data privacy, security, and ethical use of the technology during crises. The paper also recognizes the threat of Generative AI misuse, in the form of dissemination of misinformation and for adversarial attacks. Finally, we outline avenues of future research, emphasizing the need for secure, reliable, and ethical Generative AI systems for disaster management in general. We believe that this work represents the first comprehensive survey of Gen-AI techniques being used in the field of Disaster Assessment and Response. View details
    Accurate somatic small variant discovery for multiple sequencing technologies with DeepSomatic
    Jimin Park
    Daniel E. Cook
    Lucas Brambrink
    Joshua Gardner
    Brandy McNulty
    Samuel Sacco
    Ayse G. Keskus
    Asher Bryant
    Tanveer Ahmad
    Jyoti Shetty
    Yongmei Zhao
    Bao Tran
    Giuseppe Narzisi
    Adrienne Helland
    Byunggil Yoo
    Irina Pushel
    Lisa A. Lansdon
    Chengpeng Bi
    Adam Walter
    Margaret Gibson
    Tomi Pastinen
    Rebecca Reiman
    Sharvari Mankame
    T. Rhyker Ranallo-Benavidez
    Christine Brown
    Nicolas Robine
    Floris P. Barthel
    Midhat S. Farooqi
    Karen H. Miga
    Andrew Carroll
    Mikhail Kolmogorov
    Benedict Paten
    Kishwar Shafin
    Nature Biotechnology (2025)
    Preview abstract Somatic variant detection is an integral part of cancer genomics analysis. While most methods have focused on short-read sequencing, long-read technologies offer potential advantages in repeat mapping and variant phasing. We present DeepSomatic, a deep-learning method for detecting somatic small nucleotide variations and insertions and deletions from both short-read and long-read data. The method has modes for whole-genome and whole-exome sequencing and can run on tumor–normal, tumor-only and formalin-fixed paraffin-embedded samples. To train DeepSomatic and help address the dearth of publicly available training and benchmarking data for somatic variant detection, we generated and make openly available the Cancer Standards Long-read Evaluation (CASTLE) dataset of six matched tumor–normal cell line pairs whole-genome sequenced with Illumina, PacBio HiFi and Oxford Nanopore Technologies, along with benchmark variant sets. Across samples, both cell line and patient-derived, and across short-read and long-read sequencing technologies, DeepSomatic consistently outperforms existing callers. View details
    Magic state cultivation on a superconducting quantum processor
    Emma Rosenfeld
    Gabrielle Roberts
    Alexis Morvan
    Nathan Lacroix
    Alexandre Bourassa
    arXiv (2025)
    Preview abstract Fault-tolerant quantum computing requires a universal gate set, but the necessary non-Clifford gates represent a significant resource cost for most quantum error correction architectures. Magic state cultivation offers an efficient alternative to resource-intensive distillation protocols; however, testing the proposal's assumptions represents a challenging departure from quantum memory experiments. We present an experimental study of magic state cultivation on a superconducting quantum processor. We implement cultivation, including code-switching into a surface code, and develop a fault-tolerant measurement protocol to bound the magic state fidelity. Cultivation reduces the error by a factor of 40, with a state fidelity of 0.9999(1) (retaining 8% of attempts). Our results experimentally establish magic state cultivation as a viable solution to one of quantum computing's most significant challenges. View details
    Quantum Algorithms for Linear Matrix Equations
    Rolando Somma
    Guang Hao Low
    Dominic Berry
    arXiv:2508.02822 (2025)
    Preview abstract We describe an efficient quantum algorithm for solving the linear matrix equation AX+XB=C, where A, B and C are given complex matrices and X is unknown. This is known as the Sylvester equation, a fundamental equation with applications in control theory and physics. Rather than encoding the solution in a quantum state in a fashion analogous to prior quantum linear algebra solvers, our approach constructs the solution matrix X in a block-encoding, rescaled by some factor. This allows us to obtain certain properties of the entries of X exponentially faster than would be possible from preparing X as a quantum state. The query and gate complexities of the quantum circuit that implements this block-encoding are almost linear in a condition number that depends on A and B, and depend logarithmically in the dimension and inverse error. We show how our quantum circuits can solve BQP-complete problems efficiently, discuss potential applications and extensions of our approach, its connection to Riccati equation, and comment on open problems. View details
    Differentially Private Synthetic Data Release for Topics API Outputs
    Travis Dick
    Josh Karlin
    Adel Javanmard
    Peilin Zhong
    Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2025)
    Preview abstract Recently, significant attention has been devoted to the design and analysis of Privacy-Preserving Ads APIs. Despite the interest of academics and regulators in understanding the privacy properties of such APIs, the empirical study of these methods is severely hindered by the lack of publicly-available data. This is because, a reliable empirical analysis of the privacy properties on an API requires access to a dataset consisting of realistic API outputs for a large collection of users. However, privacy reasons prevent the general release of such browsing data to the public. In this work we address this problem by developing a novel methodology to construct synthetic API outputs that are simultaneously realistic enough to enable accurate study and provide strong privacy protections to the user. We focus on one of the Privacy-Preserving Ads API: the Topics API; part of Google Chrome's Privacy Sandbox which enables interest-based advertising without relying third-party cookies. We develop a methodology to generate a Differentially Private dataset realistic enough to close match the re-identification risk properties of the real Topics API data. The use of differential privacy prevents the leak of private user information from this release. Our methodology is based on first computing a large number of differentially-private statistics describing how output API traces evolve over time. Then, we design a parameterized distribution over sequences of API traces and optimize its parameters so that they closely matches the statistics obtained. Finally, we create the synthetic data by drawing from this distribution. Our work is complemented with an open source release of the anonymized dataset obtained by this methodology. We hope this will enable external researchers to analyze the API in-depth and replicate prior and future work on a realistic large-scale dataset. We believe that this work will contribute to fostering transparency on the privacy properties of Privacy-Preserving Ads APIs. View details
    Preview abstract Recently, decomposing complex problems into simple subtasks--a crucial part of human-like natural planning--to solve the given problem has significantly boosted the performance of large language models (LLMs). However, leveraging such planning structures during post-training to boost the performance of smaller open-source LLMs remains underexplored. Motivated by this, we introduce Plan-Tuning, a unified post-training framework that (i) distills synthetic task decompositions (termed “planning trajectories”) from large-scale LLMs and (ii) fine-tunes smaller models via supervised and reinforcement-learning objectives designed to mimic these planning processes to improve complex reasoning. On GSM8k and the MATH benchmarks, plan-tuned models outperform strong baselines by an average ~7%. Furthermore, plan-tuned models show better generalization capabilities on out-of-domain datasets, with average ~10% and ~12% performance improvements on OlympiadBench and AIME 2024, respectively. Our detailed analysis demonstrates how planning trajectories improves complex reasoning capabilities, showing that Plan-Tuning is an effective strategy for improving task-specific performance of smaller LLMs. View details
    Balanced coupling in electromagnetic circuits
    Juan Atalaya
    Sergei Isakov
    Physical Review Applied, 23 (2025), pp. 024012
    Preview abstract The rotating-wave approximation (RWA) is ubiquitous in the analysis of driven and coupled resonators. However, the limitations of the RWA seem to be poorly understood and in some cases the RWA disposes of essential physics. We investigate the RWA in the context of electrical circuits. Using a classical Hamiltonian approach, we find that by balancing electrical and magnetic components of the resonator drive or resonator-resonator coupling, the RWA can be made exact. This type of balance, in which the RWA is exact, has applications in superconducting qubits, where it suppresses nutation normally associated with strong Rabi driving. In the context of dispersive readout, balancing the qubit-resonator coupling changes the qubit leakage induced by the resonator drive but does not remove it in the case of the transmon qubit. View details
    ×