May 19th, 2025

Unlock Instant On-Device AI with Foundry Local

Raji Rajagopalan
VP, Product

You’re building a next generation AI-powered app. It needs to be fast, private, and work anywhere, even without internet connectivity. This isn’t just about prototyping. You’re shipping a real app to real users, with AI that delivers value and scales cost-effectively.

Meet Foundry Local—the high-performance local AI runtime stack that brings Azure AI Foundry’s power to client devices. Now in preview on Windows and macOS, Foundry Local lets you build and ship cross-platform AI apps that run models, tools, and agents directly on-device. It is included in Windows AI Foundry, delivering best in class AI capabilities and excellent cross-silicon performance on hundreds of millions of Windows devices.

This is local AI, efficient and ready for production.

What is Foundry Local?

Foundry Local brings the power and trust of Azure AI Foundry to your device. It includes everything you need to run AI apps locally.

Foundry Local Stack

High-Performance Model Execution with ONNX Runtime

Foundry Local is built on ONNX Runtime for top-tier performance across CPUs, NPUs, and GPUs. On Windows, it is integrated and optimized through deep collaboration with hardware vendors like AMD, Intel, NVIDIA, and Qualcomm on the foundation of Windows ML. On Mac, it includes GPU acceleration on Apple silicon. Foundry Local can choose the optimal silicon on the devices to run local models, with the option to specify the silicon (CPU, GPU, NPU) for execution.

Go from Exploration to Production with Foundry Local Management Service

With Foundry Local, moving from prototype to production is effortless. You can use a wide range of edge-optimized AI Foundry models—including DeepSeek R1, Qwen 2.5 Instruct, Phi-4 Reasoning, Mistral and additional ONNX-format models from Hugging Face – directly in your app. Foundry Local Management Service takes care of downloading and loading models at runtime. You can build a Windows/macOS/cross-platform application using Foundry Local and ship it to your customers.

Seamless Local AI Access with Foundry CLI & SDK

Use the new Foundry CLI to manage local models, tools, and agents with ease. With Foundry Local SDK and the Azure Inference SDK, you can interact with Foundry Local and integrate model management and local inference directly into your app. You can also use OpenAI-compatible chat completion APIs to integrate your application with Foundry Local.

Local AI Agents using MCP

Foundry Local is redefining local AI workflows with intelligent agents at the core. Using the Model Context Protocol (MCP) to call local tools, it offers a new path to smart automation—right on your device.  If you’d like to participate in this private preview, please fill out the form here.

Get Started

On Windows

  1. Open Windows Terminal
  2. Install Foundry Local using winget
    winget install Microsoft.FoundryLocal
    
  3. Run a model
    foundry model run phi-3.5-mini
    

On MacOS

  1. Open Terminal
  2. Install Foundry Local
    brew tap microsoft/foundrylocal
    brew install foundrylocal
    
  3. Run a model
    foundry model run phi-3.5-mini
    

 For more information, check out Foundry Local documentation and samples here.

What Our Private Preview Customers Are Telling Us

Over 100 customers – including ISVs and partners such as SoftwareOne, Pieces.app, and Avanade – have already been using Foundry Local and have helped shape it.

Now with Foundry Local, we have the flexibility to build hybrid agentic solutions leveraging the power of both cloud & on-premise thereby allowing us to deliver greater value to our customers without compromising on compliance or innovation. This is a game-changer!”Ratheesh Krishna Geeth, CEO – Digital Engineering & AI, iLink Digital 

“Foundry Local is positioned to provide the robust infrastructure needed to guarantee the integrity and continuous availability of these critical workflows, enabling us to deliver transformative healthcare solutions with the highest standards of reliability and quality.” –Brian Hartzer, CEO, Quantium Health 

Foundry Local makes local AI practical, powerful, and production-ready. Whether you’re building a proof-of-concept or shipping a product, it gives you the performance, flexibility, and control to run AI where it matters most. Let’s build the future of local AI—together.

Author

Raji Rajagopalan
VP, Product

Raji Rajagopalan is a VP in the CoreAI group at Microsoft. She and her team are responsible for building technology to enable efficient execution of AI across cloud and edge. In the last two decades in the tech industry, Raji has worked across Engineering and Product disciplines, building startups and running global teams.

0 comments