Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 11065 publications
CrossCheck: Input Validation for WAN Control Systems
Rishabh Iyer
Isaac Keslassy
Sylvia Ratnasamy
Networked Systems Design and Implementation (NSDI) (2026) (to appear)
Preview abstract
We present CrossCheck, a system that validates inputs to the Software-Defined Networking (SDN) controller in a Wide Area Network (WAN). By detecting incorrect inputs—often stemming from bugs in the SDN control infrastructure—CrossCheck alerts operators before they trigger network outages.
Our analysis at a large-scale WAN operator identifies invalid inputs as a leading cause of major outages, and we show how CrossCheck would have prevented those incidents. We deployed CrossCheck as a shadow validation system for four weeks in a production WAN, during which it accurately detected the single incident of invalid inputs that occurred while sustaining a 0% false positive rate under normal operation, hence imposing little additional burden on operators. In addition, we show through simulation that CrossCheck reliably detects a wide range of invalid inputs (e.g., detecting demand perturbations as small as 5% with 100% accuracy) and maintains a near-zero false positive rate for realistic levels of noisy, missing, or buggy telemetry data (e.g., sustaining zero false positives with up to 30% of corrupted telemetry data).
View details
Preview abstract
AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization.
View details
Preview abstract
How many T gates are needed to approximate an arbitrary n-qubit quantum state to within
a given precision ϵ? Improving prior work of Low, Kliuchnikov and Schaeffer, we show that the
optimal asymptotic scaling is Θ(sqrt{2^n log(1/ε)} + log(1/ε)) if we allow an unlimited number of ancilla qubits. We also show that this is the optimal T-count for implementing an arbitrary
diagonal n-qubit unitary to within error ϵ. We describe an application to batched synthesis of
single-qubit unitaries: we can approximate a tensor product of m = O(log log(1/ϵ)) arbitrary
single-qubit unitaries to within error ϵ with the same asymptotic T-count as is required to
approximate just one single-qubit unitary.
View details
Preview abstract
Semantic data models express high-level business concepts and metrics, capturing the business logic needed to query a database correctly. Most data modeling solutions are built as layers above SQL query engines, with bespoke query languages or APIs. The layered approach means that semantic models can’t be used directly in SQL queries. This paper focuses on an open problem in this space – can we define semantic models in SQL, and make them naturally queryable in SQL?
In parallel, graph query is becoming increasingly popular, including in SQL. SQL/PGQ extends SQL with an embedded subset of the GQL graph query language, adding property graph views and making graph traversal queries easy.
We explore a surprising connection: semantic data models are graphs, and defining graphs is a data modeling problem. In both domains, users start by defining a graph model, and need query language support to easily traverse edges in the graph, which means doing joins in the underlying data.
We propose some useful SQL extensions that make it easier to use higher-level data model abstractions in queries. Users can define a “semantic data graph” view of their data, encapsulating the complex business logic required to query the underlying tables correctly. Then they can query that semantic graph model easily with SQL.
Our SQL extensions are useful independently, simplifying many queries – particularly, queries with joins. We make declared foreign key relationships usable for joins at query time – a feature that seems obvious but is notably missing in standard SQL.
In combination, these extensions provide a practical approach to extend SQL incrementally, bringing semantic modeling and graph query together with the relational model and SQL.
View details
Preview abstract
For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step.
View details
LeakyFeeder: In-Air Gesture Control Through Leaky Acoustic Waves
Yongjie Yang
Tao Chen
Zhenlin An
Shirui Cao
Shangguan Longfei
SenSys 2025 - The 23rd ACM Conference on Embedded Networked Sensor Systems (2025)
Preview abstract
We present LeekyFeeder, a mobile application that explores the acoustic signals leaked from headphones to reconstruct gesture motions around the ear for fine-grained gesture control. To achieve this goal, LeekyFeeder reuses the speaker and feed-forward microphones on active noise cancellation (ANC) headphones as a SONAR system, emitting an inaudible frequency-modulated continuous-wave (FMCW) signal to track gesture reflections over time. Since this single-receiver SONAR system is unable to differentiate reflection angles and further disentangle signal reflections from different gesture parts, we draw on principles of multi-modal learning to frame gesture motion reconstruction as a multi-modal translation task and propose a deep learning-based approach to fill the information gap between low-dimensional FMCW ranging readings and high-dimensional 3D hand movements. We implement LeekyFeeder on a pair of Google Pixel Buds and conduct experiments to examine the efficacy and robustness of LeekyFeeder in various conditions. Experiments based on six gesture types inspired by Apple Vision Pro demonstrate that LeekyFeeder achieves a PCK performance of 89% at 3cm across ten users, with an average MPJPE and MPJRPE error of 2.71cm and 1.88cm, respectively.
View details
Preview abstract
Measuring software development can help drive impactful change. However, it’s a complex task, and getting started can be daunting as it involves understanding what you should measure, and determining what you can measure. This article provides a guide to selecting a framework that aligns with organizational measurement strategy.
View details
Preview abstract
Most contextual bandit algorithms assume that rewards are bounded uniformly, or with a high probability for each context and action. Furthermore, the theoretical regret, as well as empirical performance of most prevailing methods scales polynomially with this reward scale. In this work, we use ideas from robust mean estimation to design contextual bandit algorithms where the regret has only a mild logarithmic scaling on the reward scale, with a polynomial dependence on the second moment rather than the maximum reward value. Our algorithm is based on Catoni's mean estimator, is applicable to arbitrary non-linear function classes, and we present both variance aware and variance agnostic versions of our approach.
View details
AI and Generative AI Transforming Disaster Management: A Survey of Damage Assessment and Response Techniques
Aman Raj
IEEE Compsac 2025 (2025)
Preview abstract
Natural disasters, including earthquakes, wildfires and cyclones, bear a huge risk on human lives as well as infrastructure assets. An effective response to disaster depends on the ability to rapidly and efficiently assess the intensity of damage. Artificial Intelligence (AI) and Generative Artificial Intelligence (GenAI) presents a breakthrough solution, capable of combining knowledge from multiple types and sources of data, simulating realistic scenarios of disaster, and identifying emerging trends at a speed previously unimaginable. In this paper, we present a comprehensive review on the prospects of AI and GenAI in damage assessment for various natural disasters, highlighting both its strengths and limitations. We talk about its application to multimodal data such as text, image, video, and audio, and also cover major issues of data privacy, security, and ethical use of the technology during crises. The paper also recognizes the threat of Generative AI misuse, in the form of dissemination of misinformation and for adversarial attacks. Finally, we outline avenues of future research, emphasizing the need for secure, reliable, and ethical Generative AI systems for disaster management in general. We believe that this work represents the first comprehensive survey of Gen-AI techniques being used in the field of Disaster Assessment and Response.
View details
Accurate somatic small variant discovery for multiple sequencing technologies with DeepSomatic
Jimin Park
Daniel E. Cook
Lucas Brambrink
Joshua Gardner
Brandy McNulty
Samuel Sacco
Ayse G. Keskus
Asher Bryant
Tanveer Ahmad
Jyoti Shetty
Yongmei Zhao
Bao Tran
Giuseppe Narzisi
Adrienne Helland
Byunggil Yoo
Irina Pushel
Lisa A. Lansdon
Chengpeng Bi
Adam Walter
Margaret Gibson
Tomi Pastinen
Rebecca Reiman
Sharvari Mankame
T. Rhyker Ranallo-Benavidez
Christine Brown
Nicolas Robine
Floris P. Barthel
Midhat S. Farooqi
Karen H. Miga
Andrew Carroll
Mikhail Kolmogorov
Benedict Paten
Kishwar Shafin
Nature Biotechnology (2025)
Preview abstract
Somatic variant detection is an integral part of cancer genomics analysis. While most methods have focused on short-read sequencing, long-read technologies offer potential advantages in repeat mapping and variant phasing. We present DeepSomatic, a deep-learning method for detecting somatic small nucleotide variations and insertions and deletions from both short-read and long-read data. The method has modes for whole-genome and whole-exome sequencing and can run on tumor–normal, tumor-only and formalin-fixed paraffin-embedded samples. To train DeepSomatic and help address the dearth of publicly available training and benchmarking data for somatic variant detection, we generated and make openly available the Cancer Standards Long-read Evaluation (CASTLE) dataset of six matched tumor–normal cell line pairs whole-genome sequenced with Illumina, PacBio HiFi and Oxford Nanopore Technologies, along with benchmark variant sets. Across samples, both cell line and patient-derived, and across short-read and long-read sequencing technologies, DeepSomatic consistently outperforms existing callers.
View details
Magic state cultivation on a superconducting quantum processor
Emma Rosenfeld
Gabrielle Roberts
Alexis Morvan
Nathan Lacroix
Alexandre Bourassa
arXiv (2025)
Preview abstract
Fault-tolerant quantum computing requires a universal gate set, but the necessary non-Clifford gates represent a significant resource cost for most quantum error correction architectures. Magic state cultivation offers an efficient alternative to resource-intensive distillation protocols; however, testing the proposal's assumptions represents a challenging departure from quantum memory experiments. We present an experimental study of magic state cultivation on a superconducting quantum processor. We implement cultivation, including code-switching into a surface code, and develop a fault-tolerant measurement protocol to bound the magic state fidelity. Cultivation reduces the error by a factor of 40, with a state fidelity of 0.9999(1) (retaining 8% of attempts). Our results experimentally establish magic state cultivation as a viable solution to one of quantum computing's most significant challenges.
View details
Preview abstract
We describe an efficient quantum algorithm for solving the linear matrix equation AX+XB=C, where A, B and C are given complex matrices and X is unknown. This is known as the Sylvester equation, a fundamental equation with applications in control theory and physics. Rather than encoding the solution in a quantum state in a fashion analogous to prior quantum linear algebra solvers, our approach constructs the solution matrix X in a block-encoding, rescaled by some factor. This allows us to obtain certain properties of the entries of X exponentially faster than would be possible from preparing X as a quantum state. The query and gate complexities of the quantum circuit that implements this block-encoding are almost linear in a condition number that depends on A and B, and depend logarithmically in the dimension and inverse error. We show how our quantum circuits can solve BQP-complete problems efficiently, discuss potential applications and extensions of our approach, its connection to Riccati equation, and comment on open problems.
View details
Differentially Private Synthetic Data Release for Topics API Outputs
Travis Dick
Josh Karlin
Adel Javanmard
Peilin Zhong
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2025)
Preview abstract
Recently, significant attention has been devoted to the design and analysis of Privacy-Preserving Ads APIs. Despite the interest of academics and regulators in understanding the privacy properties of such APIs, the empirical study of these methods is severely hindered by the lack of publicly-available data. This is because, a reliable empirical analysis of the privacy properties on an API requires access to a dataset consisting of realistic API outputs for a large collection of users. However, privacy reasons prevent the general release of such browsing data to the public.
In this work we address this problem by developing a novel methodology to construct synthetic API outputs that are simultaneously realistic enough to enable accurate study and provide strong privacy protections to the user.
We focus on one of the Privacy-Preserving Ads API: the Topics API; part of Google Chrome's Privacy Sandbox which enables interest-based advertising without relying third-party cookies. We develop a methodology to generate a Differentially Private dataset realistic enough to close match the re-identification risk properties of the real Topics API data. The use of differential privacy prevents the leak of private user information from this release.
Our methodology is based on first computing a large number of differentially-private statistics describing how output API traces evolve over time. Then, we design a parameterized distribution over sequences of API traces and optimize its parameters so that they closely matches the statistics obtained. Finally, we create the synthetic data by drawing from this distribution.
Our work is complemented with an open source release of the anonymized dataset obtained by this methodology. We hope this will enable external researchers to analyze the API in-depth and replicate prior and future work on a realistic large-scale dataset. We believe that this work will contribute to fostering transparency on the privacy properties of Privacy-Preserving Ads APIs.
View details
PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving
Mihir Parmar
Chitta Baral
Mingyang Ling
2025
Preview abstract
Recently, decomposing complex problems into simple subtasks--a crucial part of human-like natural planning--to solve the given problem has significantly boosted the performance of large language models (LLMs). However, leveraging such planning structures during post-training to boost the performance of smaller open-source LLMs remains underexplored. Motivated by this, we introduce Plan-Tuning, a unified post-training framework that (i) distills synthetic task decompositions (termed “planning trajectories”) from large-scale LLMs and (ii) fine-tunes smaller models via supervised and reinforcement-learning objectives designed to mimic these planning processes to improve complex reasoning. On GSM8k and the MATH benchmarks, plan-tuned models outperform strong baselines by an average ~7%. Furthermore, plan-tuned models show better generalization capabilities on out-of-domain datasets, with average ~10% and ~12% performance improvements on OlympiadBench and AIME 2024, respectively. Our detailed analysis demonstrates how planning trajectories improves complex reasoning capabilities, showing that Plan-Tuning is an effective strategy for improving task-specific performance of smaller LLMs.
View details
Balanced coupling in electromagnetic circuits
Juan Atalaya
Sergei Isakov
Physical Review Applied, 23 (2025), pp. 024012
Preview abstract
The rotating-wave approximation (RWA) is ubiquitous in the analysis of driven and coupled resonators. However, the limitations of the RWA seem to be poorly understood and in some cases the RWA disposes of essential physics. We investigate the RWA in the context of electrical circuits. Using a classical Hamiltonian approach, we find that by balancing electrical and magnetic components of the resonator drive or resonator-resonator coupling, the RWA can be made exact. This type of balance, in which the RWA is exact, has applications in superconducting qubits, where it suppresses nutation normally associated with strong Rabi driving. In the context of dispersive readout, balancing the qubit-resonator coupling changes the qubit leakage induced by the resonator drive but does not remove it in the case of the transmon qubit.
View details