Generative AI-Powered Video Analytics AI Agents

Discover a collection of reference workflows that use vision language models to deliver rich, interactive visual perception capabilities for a wide range of industries.

Workloads

Computer Vision / Video Analytics

Industries

Manufacturing
Smart Cities/Spaces
Retail/ Consumer Packaged Goods
Media and Entertainment
Healthcare and Life Sciences

Business Goal

Return on Investment
Innovation

Products

NVIDIA Metropolis
NVIDIA AI Enterprise

Overview

Power A New Wave Of Applications

Traditional video analytics applications and their development workflows are typically built on fixed-function, limited models that are designed to detect and identify only a select set of predefined objects. With generative AI and foundation models, you can now build applications with fewer models that have incredibly complex and broad perception and rich contextual understanding. This newer generation of vision language models (VLMs) is giving rise to smart, powerful video analytics AI agents.

What Is a Video Analytics AI Agent?

A video analytics AI agent can combine both vision and language modalities to understand natural language prompts and perform visual question-answering. For example, answering a broad range of questions in natural language that can be applied against a recorded or live video stream. This deeper understanding of video content enables more accurate and meaningful interpretations, improving the functionality of video analytics applications and the analysis of real-world scenarios. These agents promise to unlock entirely new insights and possibilities for automation.

Streamline Every Industrial Operation

Highly perceptive, accurate, and interactive video analytics  AI agents will be deployed throughout our factories, warehouses, retail stores, airports, traffic intersections, and more. This will have a tremendous impact on operations teams looking to make better decisions using richer insights generated from natural interactions. Managers and operations teams will also communicate with these agents in natural language, all powered by generative AI and VLMs with NVIDIA NIM™ microservices at their core.


Technical Implementation

Develop With NVIDIA NIM

NVIDIA NIM is a set of inference microservices that includes industry-standard APIs, domain-specific code, optimized inference engines, and enterprise runtime. It delivers multiple VLMs for building your video analytics AI agent that can process live or archived images or videos to extract actionable insights using natural language. We’ve created a reference workflow of a video analytics AI agent that you can try out to accelerate your development process.

Build AI Agents With NVIDIA AI Blueprint

The NVIDIA AI Blueprint for video search and summarization (VSS) makes it easy to build and customize video analytics AI agents using generative AI, VLMs, LLMs, and NVIDIA NIM. The video analytics AI agents are given tasks through natural language and can analyze, interpret, and process vast amounts of video data to provide critical insights that help a range of industries optimize processes, improve safety, and cut costs.

VSS enables seamless integration of generative AI into existing computer vision pipelines—enhancing inspection, search, and analytics with multimodal understanding and zero-shot reasoning. You can easily deploy from the edge to the cloud on platforms including NVIDIA RTX PRO™ 6000, NVIDIA DGX™ Spark and NVIDIA® Jetson Thor™.

Experience the blueprint on API catalog.

Create Edge Agents With Jetson Platform Services

You can build video analytics AI agents powered by the NVIDIA Jetson™ edge AI platform using the newest feature of NVIDIA JetPack™—Jetson Platform Services. The generative AI application is completely running on an NVIDIA Jetson Orin™ device that’s capable of detecting events to generate alerts and facilitate interactive Q&A sessions.


FAQs

NIM is a set of easy-to-use microservices designed for secure, reliable deployment of high-performance AI model inferencing across the cloud, data center, and workstations. It supports a wide range of AI models, including open-source community and NVIDIA AI Foundation models, to ensure seamless, scalable AI inferencing—on-premises or in the cloud—using industry-standard APIs. All NIM microservices and associated preview APIs can be found at build.nvidia.com.

Visit build.nvidia.com to create an account and start exploring the available NIM microservices. You can check out the VLM NIMs available here.

Try the NVIDIA AI Blueprint for video search and summarization for free.

All users can get started for free with the preview APIs on build.nvidia.com. Each new account can receive up to 5,000 credits to try out the APIs. To continue development after credits run out, you can deploy the downloadable NIM microservices locally to your hardware or to a cloud instance. Developers can also access NIM via the NVIDIA Developer Program. See details in this FAQ.

NVIDIA NIM is free for developers to try out. To go to production, downloadable NIM microservices require an NVIDIA AI Enterprise License. To learn more, visit this page.

The NIM developer forum is the best place to ask questions and engage with our developer community. You can access the forums here.

Get Started

Build Video Analytics AI Agents

Explore the reference workflow, powered by multiple visual language models, to easily build your video analytics AI agent.

Developers in Action

Build Advanced Video Analytics AI Agents

Learn how to seamlessly build a video analytics AI agent using NVIDIA AI Blueprint for video search and summarization (VSS).

Develop Video Analytics AI Agents for the Edge

Explore VLM-based video analytics AI agents at the edge using NVIDIA Jetson Platform Services.

Build an Agentic Video Workflow

Learn how to build a workflow with audio input, speech output for video search, and summarization.

Build Real-Time Multimodal XR Apps

Learn how to use NVIDIA AI Blueprint for video search and summarization to support audio in an XR environment.

Deploy AI Agents From Edge to Cloud

Tap into the power of the VSS blueprint to deploy AI agents seamlessly from edge to cloud, with scalable performance across a diverse range of GPUs. VSS support for NVIDIA DGX Spark and Jetson Thor is coming soon.

NVIDIA Jetson Thor

Accelerate the future of physical AI and robotics with NVIDIA Jetson Thor series modules that deliver up to 2070 FP4 TFLOPS of AI compute and 128 GB of memory—all in a compact form factor. 

NVIDIA RTX PRO 6000 Blackwell Series GPUs

NVIDIA RTX PRO 6000 Blackwell Series GPUs accelerate physical AI by running every robot development workload across training, synthetic data generation, robot learning, and simulation.

NVIDIA DGX Spark

NVIDIA DGX Spark brings the power of NVIDIA Grace Blackwell to developer desktops. The NVIDIA GB10 Superchip, combined with 128 GB of unified system memory, lets AI researchers, data scientists, and students work with AI models locally with up to 200 billion parameters.

Related Customer Stories