AI & Cloud: Next-Gen Enterprise Guide
AI & Cloud: Next-Gen Enterprise Guide
Machine Learning
Trains a machine to solve a
specific AI problem
Data Stats & Math Prediction
Deep Learning Algorithms
• Foundation Models are AI neural networks trained on massive unlabeled datasets to handle a wide variety of jobs.
• A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each
part of the input data.
The Next Era of Generative AI
Realtime
<50ms latency
Parameters
>10T
TEXT
AUDIO
Sequence Length
IMAGE >32K input
3D
VIDEO
Gemini
DNA
GPT-3
Generative Chatbot
PROTEIN
TURING NLG
Language Model
NLLB
MOLECULE
BERT
Language Model ANIMATION
RESNET-50
Identifying Photos Large Language Models Mixtral
(Transformer)
Large Language Models
(Transformer)
Transformer
GPT-4
Image Classification
Labeled Datasets Unlabeled Datasets Generative AI Multimodal Generative AI Mixture of Experts (MoE) Production GenAI Inference
Explosive Growth in AI Computational Requirements
MT NLG 530B
Chinchilla
BLOOM
GPT3-175B
Training Compute (petaFLOPs)
PaLM
100,000,000
Microsoft T-NLG
GPT-2 1.5B
Megatron-NLG
Wav2Vec 2.0
1,000,000 XLNet MoCo ResNet50
Xception
AlexNet
100
2012 2014 2016 2018 2020 2022 2024
Executives Adding Gen AI to Strategy
2027
11 ZB
2024
140 TP2.EP8.DP4
TP2.EP16.PP2
120 GB200 FP4
B200 FP8
TP4.EP16
80
TP4.EP2.PP2.DP4
60
Blackwell 30X Hopper
40
TP8.PP4.DP2 TP8.PP2.DP4
20
TP64 TP64
0
0 10 20 30 40 50
Interactivity per User Tokens per Second
Accelerating Data Processing with Decompression Engine
Queries/second
60
50
Projected performance subject to change. Database join query with Snappy / Deflate compression derived from TPC-H Q4 query. 1x x86, 1x H100 GPU, and 1x Blackwell GPU from Grace Blackwell GB200 Superchip.
New Era of Secure AI
Confidential Computing for Performant Massive LLMs
128x Blackwell
Blackwell GPU
CPU
PCIe NVLink to128 GPUs
Secure
Enclave
PCIe
NIC
Support for custom models Improved TCO with best latency and throughput running
on accelerated infrastructure
Domain specific code
Best accuracy for enterprise by enabling tuning with
proprietary data sources
Optimized inference engines
Enterprise software with feature branches, validation and
support
NVIDIA Inception Helps Tech Startups Build and Grow Faster
Open to tech startups of all stages working in AI, AR/VR, Gaming, Networking, and Graphics
Astria Audoir
Promethean AI
Fastest Path to Production RAG Solutions
Identify POC Use Case Set Up POC Environment Build RAG Workflow Evaluate Performance Scale Deployment
Start with internal use cases that have no data Create POC environment where your data is Leverage the NVIDIA AI workflow for step-by- Begin with public benchmarks and establish Move workload to production environment
access or IP concerns. Ideal data: support stored today. Cloud provides a fast path to get step guide to building a generative AI chatbots an internal test set to gauge accuracy and and scale based on use. Measure usage,
docs, corporate communications, employee started. On prem plan for environments that connected to enterprise knowledge base with latency requirements. Move to human testing accuracy, and human feedback to determine
experience HR tools, sales communications. can scale to support multiple POCs. retrieval augmented generation. to ensure workflow performs as expected. best scale options.
Optimized Retrieval Augmented
Generation with NVIDIA
Next Generation of Enterprise Applications Connect LLMs to Enterprise Data
Retrieval Augmented Generation Improves LLM Performance and Efficiency
Models can answer questions Human-readable output AI models better understand Reduced computational costs Models can produce
about information without texts that are easier for context when generating from diverse outputs without
having been trained on that people to understand, text or other outputs retraining and model size at sacrificing accuracy or
data raising user trust inference efficiency
Enterprise-Ready Generative AI with RAG and NVIDIA NIM
Ease the journey from pilot to production
Development Deployment
NVIDIA AI Enterprise
https://www.nvidia.com/en-us/ai-data-science/ai-workflows/generative-ai-chatbots/
NeMo Retriever Supercharges RAG Applications
World class accuracy and throughput
Text Docs
2X World-class accuracy with nearly 2x
fewer incorrect answers
Embedding
Microservices
Image PDF
Prompt Event 7X Faster embedding inference throughput
Production Ready
Data
RAG Applications Depend on High Quality Text Embeddings
Low Quality
The correct dose is 7.5 mL Embeddings
Dose
Ingredients
Response
Source
What is the correct dose for a 2- Commercial
year-old? Embedding Model
</>
Prompt
High Quality
User Embeddings
Age
Dose
Ingredients
Response
Expiration
NVIDIA Retrieval QA
Lot #
For a 2-year-old the correct dose is Embedding Model
5 mL Source
Gen AI for Technician Support
Features
• Ingest (embed) large documents into vector
database for semantic search
• Cite sources in retrieved answers
• Extract and cite images, captions*
• Guardrails (no hallucination)
ai.nvidia.com
NVIDIA NIM
Prompt Event
K8s Support | Metrics & Monitoring | Identity | Secret Management | Liveness Probe
Inflight Batching
Code Llama Cohere 35B Gemma Adept Deplot Edify. Audio2Face Riva ASR cuOpt Earth-2 DeepVariant
70B 7B 110B Getty
APPLICATION NIMs
Jamba Llama 2 Mistral Edify. FuYu Kosmos-2 DiffDock ESMFold
70B 7B Shutterstock 8B, 55B
2: Upload documents
NIMS with Typoon 7B Community Model
Demo
NVIDIA NEMO FRAMEWORK
Building Generative AI Applications for the Enterprise
Build, customize and deploy generative AI models with NVIDIA NeMo
In-domain, secure, …
cited responses
In-domain queries
NeMo Curator Megatron Core NeMo Aligner Triton & TensorRT-LLM NeMo Retriever NeMo Guardrails
NVIDIA NeMo
NVIDIA AI Enterprise
Multi-modality support Data Curation @ Scale Optimized Training Model Customization Deploy at-scale Anywhere Guardrails Support
Build language, image, Extract, deduplicate, filter info Accelerate training and Easily customize with P-tuning, Run optimized inference at- Keep applications aligned NVIDIA AI Enterprise and
generative AI models from large unstructured data throughput by parallelizing the SFT, Adapters, RLHF, AliBi scale anywhere with safety and security experts by your side to
@ scale model and the training data requirements using NeMo keep projects on track
across 1,000s of nodes. Guardrails
Suite of Model Customization Tools in NeMo
Ways to customize large language models for your use-cases
• Good results leveraging pre- • Better results leveraging pre- • Best results leveraging pre- • Best results leveraging pre-
Benefits trained LLMs trained LLMs trained LLMs trained LLMs
• Lowest investment • Lower investment • Will not forget old skills • Change all model parameters
• Least expertise • Will not forget old skills
• Cannot add as many skills or • Less comprehensive ability to • Medium investment • May forget old skills
Challenges domain specific data to pre- change all model parameters • Takes longer to train • Large investment
trained LLM • More expertise needed • Most expertise needed
Beyond Text Generation : Building Generative AI
Application for Images and Videos
NVIDIA Optimized Visual Foundation Models
Model Purpose
NV-DINOv2 Vision-only backbone for downstream Vision AI tasks–image classification, detection, segmentation
Image-text matching model that can align image features with text features; backbone for downstream Vision AI tasks–image
NV-CLIP
classification, detection, segmentation
Faster, more efficient version of SAM (Segment Any Model), which is a visual FM for segmenting any object based on different types of
EfficientViT-SAM
visual prompts such as single coordinate or bounding box
VILA Family of visual language model for image and video understanding and Q&A
LITA Visual language model for video understanding and context with spatial and temporal localization
Foundation Pose 6-DoF object pose estimation and tracking, providing the object pose and 3D bounding box
Sensor fusion model which fuses multiple input sensors–cameras, LiDAR, radar, etc. to create a bird’s eye view of the scene with 3D
BEVFusion
bounding box representation of the objects
NeVa Multi-modal visual language model for image understanding and Q&A
LiDARSAM Segment any object based on user provided text prompts on 3D point cloud LiDAR data
Fine-Tune in 100 or Less Samples for Image Classification
Foundation
Backbones
NV-DINO / Zero-shot Class label, b-box, mask, text
Data NV-CLIP
Diffusion Image
Foundation Models Are Great, but Not Perfect
Zero-shot inference
Reference Test
Image Features
Image Image Features
Backbone
BERT-B
Feature Query
Enhancer Selection
Cross-modalities
Prompt Queries
Text
Text Features Text Features
“Person wearing Backbone
glasses” Swin-B Cross-modality
Decoder
Language
Cross-modality Feature
Guided
Fusion
Use Cases:
SyntheticaDETR FoundationPose
FoundationPose FoundationGrasp cuMotion
Industrial and indoor asset detection 6D pose estimation and Grasp identification GPU-accelerated
tracking of novel objects and annotation trajectory planning
Faster Development Time Accelerated Robot Motion Generation Highly Accurate & Performant Modules
Controller
Container Runtime allows building and
running GPU accelerated
NVIDIA Driver containers
NVIDIA GPU
Kubernetes cluster
43
INTRODUCING THE GPU OPERATOR
Scale GPU-Accelerated Kubernetes Without Development
44
GPU Operator Ecosystem
Enabling Infrastructure Teams
Container Platforms
Container Engines
Operating Systems
“We were able to get up and running within a “DGX Cloud is the gold standard … Distributed “On-demand access to powerful computing with
day versus several months of onboarding with training was more than twice as fast as other DGX Cloud enabled... an estimated 100X speed-up in
existing cloud service providers.” leading services…” our antibody engineering pipeline”
- Andrew Ferguson, Co-founder, Evozyne - Michael Royzen, Co-Founder & CEO - Thomas Bourquard, CSO and co-founder, MAbSilico