Microsoft Research Blog

BenchmarkQED: Automated benchmarking of RAG systems

June 5, 2025 | Darren Edge, Ha Trinh, Andres Morales Esquivel, and Jonathan Larson

BenchmarkQED is an open-source toolkit for benchmarking RAG systems using automated query generation, evaluation, and dataset prep. It shows that LazyGraphRAG outperforms standard methods, especially on complex, global queries.

Diagram showing how the dimensions of query source (data-driven vs activity-driven) and query scope (local vs global) create four query classes that span the local-to-global query spectrum: data-local, activity-local, data-global, and activity-global.

Recent Posts

Filter by Research Area

BenchmarkQED: Automated benchmarking of RAG systems

June 5, 2025 | Darren Edge, Ha Trinh, Andres Morales Esquivel, and Jonathan Larson

BenchmarkQED is an open-source toolkit for benchmarking RAG systems using automated query generation, evaluation, and dataset prep. It shows that LazyGraphRAG outperforms standard methods, especially on complex, global queries.
FrodoKEM: A conservative quantum-safe cryptographic algorithm

May 27, 2025 | Patrick Longa

The recent advances in quantum computing offer many advantages—but also challenge current cryptographic strategies. Learn how FrodoKEM could help strengthen security, even in a future with powerful quantum computers.
Magentic-UI, an experimental human-centered web agent

May 19, 2025

Magentic-UI, new from Microsoft Research, is an open-source research prototype of a human-centered AI agent, designed to work with people to complete complex, web-based tasks in real time over a web browser.
Predicting and explaining AI model performance: A new approach to evaluation

May 12, 2025 | Lexin Zhou and Xing Xie

ADeLe, a new evaluation method, explains what AI systems are good at—and where they’re likely to fail. By breaking tasks into ability-based requirements, it has the potential to provide a clearer way to evaluate and predict AI model performance.
Research Focus: Week of May 7, 2025

May 7, 2025

In this issue: New research on compound AI systems and causal verification of the Confidential Consortium Framework; release of Phi-4-reasoning; enriching tabular data with semantic structure, and more.
Microsoft Fusion Summit explores how AI can accelerate fusion research

May 7, 2025 | Kenji Takeda, Shruti Rajurkar, and Ade Famoti

The first Microsoft Research Fusion Summit brought together global experts to explore how AI can help unlock the potential of fusion energy. Discover how collaborations with leading institutions can help speed progress toward clean, scalable energy.
Societal AI: Building human-centered AI systems

May 5, 2025 | Beibei Shi, Haotian Li, and Xing Xie

Learn about a new white paper on Societal AI, an interdisciplinary framework for guiding AI development that reflects shared human values. It presents key research challenges and emphasizes collaboration across disciplines.
Research Focus: Week of April 21, 2025

April 23, 2025

In this issue: our CHI 2025 & ICLR 2025 contributions, plus research on causal reasoning & LLMs; countering LLM jailbreak attacks; and how people use AI vs. AI-alone. Also, SVP of Microsoft Health Jim Weinstein talks rural healthcare innovation.
The Future of AI in Knowledge Work: Tools for Thought at CHI 2025

April 18, 2025

Join us at CHI 2025 to explore how AI systems can be used as Tools for Thought as we reimage AI’s role in human thinking. Learn about new research, prototypes, and a workshop on designing AI that supports critical thinking, decision-making, and creativity.
Engagement, user expertise, and satisfaction: Key insights from the Semantic Telemetry Project

April 14, 2025

Semantic Telemetry Project data show that people who use AI for more professional and complex tasks are more likely to keep using the tool and to use it more often. Novice AI users engage in simpler tasks, but their usage is becoming more complex.
Debug-gym: an environment for AI coding tools to learn how to debug code like programmers

April 10, 2025

Developers spend a lot of time debugging code. Learn how debug-gym can equip AI agents to help, enabling them to set breakpoints, navigate the codebase, and print runtime variable values on demand, so they better understand the code and its execution flow.
Research Focus: Week of April 7, 2025

April 9, 2025

In this issue: We introduce a new dataset designed to assist renewable energy infrastructure planners, a new method for denoising MRI imagery, and an AI tool for analyzing distant galaxies. Check out our latest research and other updates.