0% found this document useful (0 votes)
252 views46 pages

AI & Cloud: Next-Gen Enterprise Guide

Uploaded by

canh doi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
252 views46 pages

AI & Cloud: Next-Gen Enterprise Guide

Uploaded by

canh doi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Navigate your way in the next wave of AI & Cloud

Ettikan Kandasamy Karuppiah (Ph.D),


Director/Technologist, ROAP Region NVIDIA
28th June 2024
Generative AI is a Sub-set of Deep Learning

Solves a task by making


Artificial Intelligence computers to mimic human Learn Analyze Predict
behavior

Machine Learning
Trains a machine to solve a
specific AI problem
Data Stats & Math Prediction
Deep Learning Algorithms

Uses neural networks to solve an


AI problem
Data Neural Networks Prediction

Generative AI Learns patterns and trends from


the training data using neural
networks, understand the
context to create new content
that mimics human generated
content Training Data Understand
(a large dataset) Foundation Models Built Context and
on Transformer Generate New and
Architecture Unique Content

• Foundation Models are AI neural networks trained on massive unlabeled datasets to handle a wide variety of jobs.
• A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each
part of the input data.
The Next Era of Generative AI

Realtime
<50ms latency

Parameters
>10T

TEXT

AUDIO
Sequence Length
IMAGE >32K input

3D

VIDEO
Gemini
DNA
GPT-3
Generative Chatbot
PROTEIN
TURING NLG
Language Model
NLLB
MOLECULE
BERT
Language Model ANIMATION

RESNET-50
Identifying Photos Large Language Models Mixtral
(Transformer)
Large Language Models
(Transformer)

Transformer
GPT-4
Image Classification

Labeled Datasets Unlabeled Datasets Generative AI Multimodal Generative AI Mixture of Experts (MoE) Production GenAI Inference
Explosive Growth in AI Computational Requirements

Before Transformers = 8x / 2yrs


Transformers = 256x / 2yrs GPT-MoE-1.8T
10,000,000,000

MT NLG 530B
Chinchilla
BLOOM
GPT3-175B
Training Compute (petaFLOPs)

PaLM
100,000,000
Microsoft T-NLG
GPT-2 1.5B

Megatron-NLG
Wav2Vec 2.0
1,000,000 XLNet MoCo ResNet50
Xception

InceptionV3 BERT Large


GPT-1
VGG-19 Resnet Transformer
10,000 ResNeXt
Seq2Seq
ELMo
DenseNet201

AlexNet
100
2012 2014 2016 2018 2020 2022 2024
Executives Adding Gen AI to Strategy

Source: PwC Survey, Aug. 2023


The Amount of Enterprise Data is Massive & Growing
NVIDIA Accelerated Retrieval-Augmented Generation

800,000 Libraries of Congress

20 ZB 83% unstructured data


Unique Enterprise
Data Created 50% audio and video

2027

11 ZB

2024

Source: IDC Global DataSphere


GPT-MoE 1.8T
Throughput per GPU Inference (seqlen=32k/1k, FTL=5s) Multi-Dimensional Optimization:
Tokens per Second • Tensor Parallel
• Pipeline Parallel
160 • Expert Parallel
• Data Parallel

140 TP2.EP8.DP4

TP2.EP16.PP2
120 GB200 FP4

B200 FP8

100 H200 FP8

TP4.EP16
80

TP4.EP2.PP2.DP4
60
Blackwell 30X Hopper

40
TP8.PP4.DP2 TP8.PP2.DP4

20

TP64 TP64
0
0 10 20 30 40 50
Interactivity per User Tokens per Second
Accelerating Data Processing with Decompression Engine

Database Join Query Performance


100
18X
90
80
70

Queries/second
60
50

• Supported Formats: Deflate, Snappy and LZ4 40


8X
• Grace CPU memory and high speed C2C link - rapid access to 30
Blackwell for large databases
20
10 1X
0
x86 HGX B200 GB200 NVL72

Projected performance subject to change. Database join query with Snappy / Deflate compression derived from TPC-H Q4 query. 1x x86, 1x H100 GPU, and 1x Blackwell GPU from Grace Blackwell GB200 Superchip.
New Era of Secure AI
Confidential Computing for Performant Massive LLMs

PERFORMANT END-TO-END AI SECURITY ENABLING DISTRIBUTED AI ECOSYSTEM


Encrypted on Every Channel Allowing Every Contributor to Protect IP
Same Performance

128x Blackwell

Blackwell GPU
CPU
PCIe NVLink to128 GPUs

Secure
Enclave

PCIe

NIC

Users Data Foundation Cloud Service Compute


Model Infra
NVIDIA Inference Microservices (NIM) for Generative AI Inference
Available now as part of NVIDIA AI Enterprise 5.0; $4500/GPU/Year, $1/GPU/HR

NVIDIA NIM Deploy anywhere and maintain control of


generative AI applications and data

Simplified development of AI application that


Prebuilt container and Helm chart can run in enterprise environments

Day 0 support for all generative AI models providing


Industry standard APIs
choice across the ecosystem

Support for custom models Improved TCO with best latency and throughput running
on accelerated infrastructure
Domain specific code
Best accuracy for enterprise by enabling tuning with
proprietary data sources
Optimized inference engines
Enterprise software with feature branches, validation and
support
NVIDIA Inception Helps Tech Startups Build and Grow Faster
Open to tech startups of all stages working in AI, AR/VR, Gaming, Networking, and Graphics

Astria Audoir

Promethean AI
Fastest Path to Production RAG Solutions

Identify POC Use Case Set Up POC Environment Build RAG Workflow Evaluate Performance Scale Deployment
Start with internal use cases that have no data Create POC environment where your data is Leverage the NVIDIA AI workflow for step-by- Begin with public benchmarks and establish Move workload to production environment
access or IP concerns. Ideal data: support stored today. Cloud provides a fast path to get step guide to building a generative AI chatbots an internal test set to gauge accuracy and and scale based on use. Measure usage,
docs, corporate communications, employee started. On prem plan for environments that connected to enterprise knowledge base with latency requirements. Move to human testing accuracy, and human feedback to determine
experience HR tools, sales communications. can scale to support multiple POCs. retrieval augmented generation. to ensure workflow performs as expected. best scale options.
Optimized Retrieval Augmented
Generation with NVIDIA
Next Generation of Enterprise Applications Connect LLMs to Enterprise Data
Retrieval Augmented Generation Improves LLM Performance and Efficiency

Improved Reduced Computational


Natural Language Interface Contextual Understanding Improved Efficiency
Accuracy Costs

Models can answer questions Human-readable output AI models better understand Reduced computational costs Models can produce
about information without texts that are easier for context when generating from diverse outputs without
having been trained on that people to understand, text or other outputs retraining and model size at sacrificing accuracy or
data raising user trust inference efficiency
Enterprise-Ready Generative AI with RAG and NVIDIA NIM
Ease the journey from pilot to production

Development Deployment

NVIDIA AI Enterprise

https://www.nvidia.com/en-us/ai-data-science/ai-workflows/generative-ai-chatbots/
NeMo Retriever Supercharges RAG Applications
World class accuracy and throughput

Text Docs
2X World-class accuracy with nearly 2x
fewer incorrect answers

Embedding
Microservices
Image PDF
Prompt Event 7X Faster embedding inference throughput

Optimized Inference Engines


NIM

Vector Retrieval Reranking


Database Microservices Microservices World class models and community
model support

Flexible and modular deployment


Structured Data Text-to-SQL
(ERP, CRM) Microservice Plan

Customizable models and pipelines

Production Ready
Data
RAG Applications Depend on High Quality Text Embeddings

Low Quality
The correct dose is 7.5 mL Embeddings

Dose
Ingredients
Response
Source
What is the correct dose for a 2- Commercial
year-old? Embedding Model

</>
Prompt
High Quality
User Embeddings

Age
Dose
Ingredients
Response
Expiration
NVIDIA Retrieval QA
Lot #
For a 2-year-old the correct dose is Embedding Model
5 mL Source
Gen AI for Technician Support

Information retrieval for technical documents


• Two volume “Aviation Maintenance Technician
Handbook —Airframe” FAA manual (~1,200 pages
in total)

NeMo LLM models


• 43B model (4K token limit)
• 22B model (16K token limit)

Features
• Ingest (embed) large documents into vector
database for semantic search
• Cite sources in retrieved answers
• Extract and cite images, captions*
• Guardrails (no hallucination)

* on the product roadmap


NVIDIA Inference Microservices (NIM)
Experience and Run Enterprise Generative AI Models Anywhere
Seamlessly integrate AI in business applications with NVIDIA AI APIs

ai.nvidia.com

Text Summarization Speech Generation Security Data Privacy

NVIDIA NIM

Drug Discovery Visual Content Performance Optimized Enterprise-Support

Experience Models Prototype with APIs Deploy with NIMs


Anatomy of a NIM
NIM Made Easy

Prompt Event

Industry Standard APIs


Text | Speech | Image | Video | 3D | Biology

Cloud Native Container

K8s Support | Metrics & Monitoring | Identity | Secret Management | Liveness Probe

Triton Inference Server

cuDF | CV-CUDA | DALI | NCCL

Pre Processing Post Processing

Inflight Batching

Triton has 417 packages/libraries


across OSS, 3rd party and NVIDIA
TensorRT Engine

cuBLAS | cuDNN | In-Flight Batching | Memory Optimization | FP8 Quantization

Customization Cache AI Model


LORA | P-tuning Text-to-Text | Text-to-Image | Text-to-3D | Multimodal | ASR | Text-to-Speech

TensorRT has 333 packages/libraries


across OSS, 3rd party and NVIDIA
NVIDIA NIM for Every Domain
Vertical Specific NIM Are Available And Growing

LANGUAGE VISUAL / MULTIMODAL DIGITAL OPTIMIZATION / DIGITAL BIOLOGY


NIMs NIMs HUMAN NIMs SIMULATION NIMs NIMs

Code Llama Cohere 35B Gemma Adept Deplot Edify. Audio2Face Riva ASR cuOpt Earth-2 DeepVariant
70B 7B 110B Getty

APPLICATION NIMs
Jamba Llama 2 Mistral Edify. FuYu Kosmos-2 DiffDock ESMFold
70B 7B Shutterstock 8B, 55B

Llama Retrieval Retrieval


Guard Embedding Reranking

Mixtral Nemotron-3 Phi-2 NeVA SDXL SDXL MolMIM Vista 3D


8x7B 22B Persona 1.0 Turbo
NVIDIA Riva: Using Speech AI for Transcription, Translation, and Voice
Highlight Reel Demo Video
NIMS Demo with Community Model :
Typoon model
With Custom Tokenizer
https://huggingface.co/scb10x/typhoon-7b
Simplifying Model Inference with NIMS
You Can Get Any Community Model – E.g Https://huggingface.co/scb10x/typhoon-7b

0: Set up the retrieval pipeline


in just a few lines

1: Create document collection

2: Upload documents
NIMS with Typoon 7B Community Model
Demo
NVIDIA NEMO FRAMEWORK
Building Generative AI Applications for the Enterprise
Build, customize and deploy generative AI models with NVIDIA NeMo

Data Distributed Model Accelerated Retrieval Augmented


Curation Training Customization Inference Generation Guardrails

In-domain, secure, …
cited responses

In-domain queries

NeMo Curator Megatron Core NeMo Aligner Triton & TensorRT-LLM NeMo Retriever NeMo Guardrails

Model Development Enterprise Application Deployment

NVIDIA NeMo

NVIDIA AI Enterprise

Multi-modality support Data Curation @ Scale Optimized Training Model Customization Deploy at-scale Anywhere Guardrails Support

Build language, image, Extract, deduplicate, filter info Accelerate training and Easily customize with P-tuning, Run optimized inference at- Keep applications aligned NVIDIA AI Enterprise and
generative AI models from large unstructured data throughput by parallelizing the SFT, Adapters, RLHF, AliBi scale anywhere with safety and security experts by your side to
@ scale model and the training data requirements using NeMo keep projects on track
across 1,000s of nodes. Guardrails
Suite of Model Customization Tools in NeMo
Ways to customize large language models for your use-cases

Data, compute &


investment

Accuracy for specific use-cases

PROMPT ENGINEERING PROMPT LEARNING PARAMETER EFFICIENT FINE-TUNING FINE TUNING

• Few-shot learning • Prompt tuning • Adapters • SFT


Techniques • Chain-of-thought reasoning • P-tuning • LoRA • RLHF
• System prompting • IA3 • SteerLM

• Good results leveraging pre- • Better results leveraging pre- • Best results leveraging pre- • Best results leveraging pre-
Benefits trained LLMs trained LLMs trained LLMs trained LLMs
• Lowest investment • Lower investment • Will not forget old skills • Change all model parameters
• Least expertise • Will not forget old skills

• Cannot add as many skills or • Less comprehensive ability to • Medium investment • May forget old skills
Challenges domain specific data to pre- change all model parameters • Takes longer to train • Large investment
trained LLM • More expertise needed • Most expertise needed
Beyond Text Generation : Building Generative AI
Application for Images and Videos
NVIDIA Optimized Visual Foundation Models

Model Purpose

NV-DINOv2 Vision-only backbone for downstream Vision AI tasks–image classification, detection, segmentation

Image-text matching model that can align image features with text features; backbone for downstream Vision AI tasks–image
NV-CLIP
classification, detection, segmentation

Grounding-DINO Open-vocabulary object detection with text prompts as input

Faster, more efficient version of SAM (Segment Any Model), which is a visual FM for segmenting any object based on different types of
EfficientViT-SAM
visual prompts such as single coordinate or bounding box

VILA Family of visual language model for image and video understanding and Q&A

LITA Visual language model for video understanding and context with spatial and temporal localization

Foundation Pose 6-DoF object pose estimation and tracking, providing the object pose and 3D bounding box

Sensor fusion model which fuses multiple input sensors–cameras, LiDAR, radar, etc. to create a bird’s eye view of the scene with 3D
BEVFusion
bounding box representation of the objects

NeVa Multi-modal visual language model for image understanding and Q&A

LiDARSAM Segment any object based on user provided text prompts on 3D point cloud LiDAR data
Fine-Tune in 100 or Less Samples for Image Classification

Train in as few as 10 samples


Few-shot Learning on NV-DINOv2
Ground Truth Labels
100
NV-DINOv2
90
GC-ViT
80
Feature Vector
70
Feature Vector
0.2 60
Fine-tune with TAO
0.3
50
1.4 10 100 1000
NV-DINOv2 Trained Weights
PCB defect classification
.
. Demo: Foundational Model Fine-tuning
Dataset Inference on the
.
Foundational Model
Prediction
1.1
- Frozen Inference

NV-DINOv2 Foundational Model Trained on >100M Image/Text Pair


Using Foundation Backbones for Downstream CV Task

Classification Class label


Feature
Vector

Detection Bounding box & labels

Segmentation Class label/pixel (mask)

Foundation
Backbones
NV-DINO / Zero-shot Class label, b-box, mask, text
Data NV-CLIP

Image Retrieval Image

VLMs Text, image

Diffusion Image
Foundation Models Are Great, but Not Perfect
Zero-shot inference

Reference Test

Question: What type of shot is this?


Answer: This is a basketball shot Change Prediction Ground Truth
Prompt: Person With Basketball Video Summarization Change Detection
Prompt
Player holding basketball

Before Fine-tuning After Fine-tuning


Zero-shot Detection With Context Using Grounding-DINO

Image Features
Image Image Features
Backbone
BERT-B

Feature Query
Enhancer Selection

Cross-modalities
Prompt Queries
Text
Text Features Text Features
“Person wearing Backbone
glasses” Swin-B Cross-modality
Decoder
Language
Cross-modality Feature
Guided
Fusion

Use Cases:

Image Search Detecting Anomalies Unseen Environment Auto labeling


Robotics : Isaac Manipulator
Collection of Modular GPU-Accelerated Libraries & Foundation Models

ISVs & SIs OEMs Developer Platforms

Smart Manufacturing Warehouse and Logistics

SyntheticaDETR FoundationPose
FoundationPose FoundationGrasp cuMotion
Industrial and indoor asset detection 6D pose estimation and Grasp identification GPU-accelerated
tracking of novel objects and annotation trajectory planning

Faster Development Time Accelerated Robot Motion Generation Highly Accurate & Performant Modules

Availability: Developer Preview in May 2024


BE CLOUD NATIVE
What does a GPU K8s cluster look like?
Run containers &
communicate with control
plane
GPU Worker Node
Control Plane
Services interface to local
Kubelet containers
API server
Proxy
etcd
Advertises Devices
Device Plugin
Scheduler e.g. NVIDIA GPUs

Controller
Container Runtime allows building and
running GPU accelerated
NVIDIA Driver containers

NVIDIA GPU
Kubernetes cluster
43
INTRODUCING THE GPU OPERATOR
Scale GPU-Accelerated Kubernetes Without Development

44
GPU Operator Ecosystem
Enabling Infrastructure Teams

Container Platforms

Container Engines

Operating Systems

NVIDIA Certified Systems NVIDIA Physical and Virtual GPUs

Find the Support Matrix here: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/platform-support.html


Support via NVIDIA AI Enterprise
GPU Operator is included in NVIDIA AI Enterprise

Hosted on the NGC Enterprise Catalog and


pre-configured to be used with NVIDIA
vGPUs.
Prebuilt vGPU driver image
- Only available with NVIDIA AI Enterprise
See How Inception Partners are Using DGX Cloud
DGX Cloud Allows Fast And NVIDIA Native Environment Development

“We were able to get up and running within a “DGX Cloud is the gold standard … Distributed “On-demand access to powerful computing with
day versus several months of onboarding with training was more than twice as fast as other DGX Cloud enabled... an estimated 100X speed-up in
existing cloud service providers.” leading services…” our antibody engineering pipeline”
- Andrew Ferguson, Co-founder, Evozyne - Michael Royzen, Co-Founder & CEO - Thomas Bourquard, CSO and co-founder, MAbSilico

“DGX Cloud gave me the freedom to easily fit large


models and perform multiple hyperparameter
sweeps. This removes a lot of the guesswork and “DGX Cloud lets me go from a single node to a 32-
helps us focus on outcome-driven research. .” GPU cluster with push button simplicity.”

- Andrew Ferguson, Co-founder, Evozyne - Andrew Ferguson, Co-founder, Evozyne


NVIDIA Developer Program
The Community that Builds

Join the Community


Program Benefits:
Tools
• 650+ exclusive SDKs and models
• GPU-optimized software, model scripts, and containerized apps
• Early access programs
Training
• Research papers, technical documentation, webinars, blogs, and news
• Technical training and certification opportunities
• 1,000s of technical sessions from industry events On-Demand
Community
• NVIDIA developer forums
• Exclusive meetups, hackathons, and events

You might also like