1
ALISON B LOWNDES
AI DevRel | EMEA
@alisonblowndes
March 2019
2
INSERT M82 GALAXY SIMS
www.FrontierDevelopmentLab.org
4
5
6
The day job
AUTOMOTIVE
Auto sensors reporting
location, problems
COMMUNICATIONS
Location-based advertising
CONSUMER PACKAGED GOODS
Sentiment analysis of
what’s hot, problems
$
FINANCIAL SERVICES
Risk & portfolio analysis
New products
EDUCATION & RESEARCH
Experiment sensor analysis
HIGH TECHNOLOGY /
INDUSTRIAL MFG.
Mfg. quality
Warranty analysis
LIFE SCIENCES MEDIA/ENTERTAINMENT
Viewers / advertising
effectiveness
ON-LINE SERVICES /
SOCIAL MEDIA
People & career matching
HEALTH CARE
Patient sensors,
monitoring, EHRs
OIL & GAS
Drilling exploration sensor
analysis
RETAIL
Consumer sentiment
TRAVEL &
TRANSPORTATION
Sensor analysis for
optimal traffic flows
UTILITIES
Smart Meter analysis
for network capacity,
LAW ENFORCEMENT
& DEFENSE
Threat analysis - social media
monitoring, photo analysis
7
SELECTING THE RIGHT GPU SOLUTION
8
NVIDIA CUDA-X AI ECOSYSTEM
FRAMEWORKS CLOUD DEPLOYMENT
Workstation CloudServer
DA GRAPH DL TRAINML DL INFERENCE
Amazon
SageMaker
Serving
Amazon
SageMaker Neo
Google
Cloud ML
CUDA-X AI
CUDA
Azure Machine Learning
9
CUDA X ECOSYSTEM
PRogrammable Acceleration across multiple Domains with one Architecture (PRADA)
Specialized PerformanceEase of use
FrameworksApplications Libraries
Directives and
Standard Languages
Extended Standard
Languages
CUDA-C++
CUDA Fortran
GPU Users Domain
Specialists
Problem
Specialists
New Algorithm Developers and
Optimization Experts
10
THE NEW NGC
GPU-optimized Software Hub. Simplifying DL, ML and HPC Workflows
NGC
50+ Containers
DL, ML, HPC
50+ Pre-trained Models
NLP, Classification, Object Detection & more
Industry Workflows
Medical Imaging, Intelligent Video Analytics
10+ Model Training Scripts
NLP, Image Classification, Object Detection & more
Innovate Faster
Deploy Anywhere
Simplify Deployments
11
TESLA V100
TENSOR CORE GPU
World’s Most Advanced
Data Center GPU
5,120 CUDA cores
640 NEW Tensor cores
7.8 FP64 TFLOPS | 15.7 FP32 TFLOPS
125 Tensor TFLOPS
32 GB HBM2 @ 900GB/s | 300GB/s NVLink
12
THE PATH FORWARD
CPU + Accelerator Simulation + AIFull-stack Optimization FP64 + Multi-Precision
+
5.3 7.810.6 15.7
21.2
125
P100 V100
NETWORK COMPLEXITY IS EXPLODING
14
PRICING DERIVATIVES
COMPARING CPU TO GPU
European Options SPX Monte Carlo Simulation
Louis Scott — March 2018
American Options SPY Finite Difference Method
Finite Difference Method Hardware Compute Time Speed Up Factor
Seconds
Days to Option Expiration 105 CPU - i7 5630 146
GPU- V100 0.268 544 x faster
Monte Carlo 2m simulations 4 options Hardware Compute Time Speed Up Factor
Seconds
Days to Option Expiration 105 CPU - i7 5630 47
GPU- V100 0.18 267 x faster
http://on-demand.gputechconf.com/gtc/2018/video/S8123/
15
NVIDIA RESEARCH
16
HOW
17
18
12
6
39
GPU
POWERED
WORKFLOW
DAY IN THE LIFE OF A DATA SCIENTIST
Train Model
Validate
Test Model
Experiment with
Optimizations and
Repeat
Go Home on Time
Dataset
Downloads
Overnight
Start
GET A COFFEE
Stay Late
Restart Data Prep
Workflow Again
Find Unexpected Null
Values Stored as String…
Switch to Decaf
12
6
39
CPU
POWERED
WORKFLOW
Restart Data Prep
Workflow
@*#! Forgot to Add
a Feature
ANOTHER…
GET A COFFEE
Start Data Prep
Workflow
GET A COFFEE
Configure Data Prep
Workflow
Dataset
Downloads
Overnight
Dataset Collection Analysis Data Prep Train Inference
NATURAL LANGUAGE
PROCESSING FOR SIGNAL
GENERATION ON NEWS DATA
WORD EMBEDDINGS
GLoVe – Global Vectors for Word Representation, utilizes the word-to-word co-
occurrence statistics from a given corpus.
22
Algorithmic Trading using Deep Autoencoder
based Statistical Arbitrage
NVIDIA Deep Learning Institute
23
Moving Average – Mean Reversion
24
Autoencoder
25
26
Snorkel
https://github.com/HazyResearch/snorkel
Automation of labelling data
A system for rapidly creating, modeling, and managing
training data, For domains in which large labelled training
sets are not available or easy to obtain.
Learning, essentially, which labelling functions are more
accurate than others—and then using this to train a DNN
A general framework for many weak supervision techniques.
27
AUTO ML: AI CREATING AI
http://automl.chalearn.org/
Definitions
29
“the machine equivalent of experience”
30
WHAT IS A TENSOR?
And why do they flow?
31
WHAT IS A TENSOR?
And why do they flow?
Scalar is a list of numbers with 0 indices (length 1) a
Vector is a list of numbers with 1 index of length k a[k]
Matrix is a list of numbers with 2 indices of length r,c a[r,c]
A Tensor is a list of numbers with n indices of length n1, n2, …, nm a[n1..nm]
an n-dimensional array
32
Origin of Neural Networks
Input Output
33
A Simple Neuron
Input Output
Neuron
x1
w2x2
y
w1x1
x2
34
Neural Network Basics
2nd step: sum
1st step: Activations * Weights
3rd step: activate
35
Combining Neurons
x1
x2
x3
x4
x5
—Additional neurons can be added to
create a layer
—Multiple layers can also be added,
resulting in input, hidden, and output
layers
—Expanding the neural network size
creates additional predictive power
—In feed forward neural networks,
neurons are fully connected to
surrounding layers
y
Input
Layer
Output
Layer
Hidden
Layers
36
Deep Neural Networks (DNNs)
x1
x2
x3
x4
x5
Input Layer Output LayerMany Hidden Layers
y
37
Image “Volvo XC90”
Image source: “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks” ICML 2009 & Comm. ACM 2011.
Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng.
CONVOLUTIONAL NEURAL NETWORKS
38Yann Le Cun (FaceBook) ISSCC, Feb 2019
39
ROBUST ML
https//madry-lab.ml
40
http://distill.pub/2017/momentum/
41
Li et al, University of Maryland & US Naval Academy
https://arxiv.org/pdf/1712.09913.pdf
42
AUTOENCODERS
UNSUPERVISED feature learning
Sparse
Representation
Training
Data
Reconstruction
Encoder Decoder
Minimize Reconstruction Error
-
InputLayer
HiddenLayer
HiddenLayer
HiddenLayer
BottleneckLayer
HiddenLayer
HiddenLayer
HiddenLayer
OutputLayer
43
Long short-term memory (LSTM)
Hochreiter (1991) analysed vanishing gradient “LSTM falls out of this almost naturally”
Gates control importance of
the corresponding
activations
Training
via
backprop
unfolded
in time
LSTM:
input
gate
output
gate
Long time dependencies are preserved until
input gate is closed (-) and forget gate is open (O)
forget
gate
Fig from Vinyals et al, Google April 2015 NIC Generator
Fig from Graves, Schmidhuber et al, Supervised
Sequence Labelling with RNNs
44
ARCHITECTURES
Larger image: http://www.asimovinstitute.org/neural-network-zoo/
45
Types of ML/DL
46
TRAINING VS INFERENCE
47
POET
48
Tabish Rashid et al, Oxford/DeepMind QMIX: https://arxiv.org/pdf/1803.11485.pdf
49
DEEPMIND ALPHA* 1.0 => 2.0 => 0.0
* ..a deeply structured hybrid (Gary Marcus, Jan 2018)
50
51
52
53
COMPUTATIONAL SCALE REQUIRED
3 million labeled images
1 DGX-1 trains 300k labeled images on 1 DNN in 1 day
10 DNNs required for self-driving
10 parallel experiments at all times
100 DGX-1 per car
54
INNOVATIONS & KEY METRICS DISTILLED
NVIDIA SATURNV REFERENCE
Design Best Practices
Lessons learned from building the
worlds largest AI Infrastructure
Reference Architectures
Partner Reference Architectures,
Scale up racks
Product Innovation & Quality
Rapid exploration & resolution
of customer issues
55
Illuminate Deep Networks
achler@OptimizingMind.com
Original
Image
Most
Certain
Uncertain
Layer 273: Visualizing expected vs actual outputs
Uncertain
Inputs of two filters which are most uncertain
57
USE CASES
58
DATA SCIENCE
IN FINANCE
Alpha Stock Identification
Analyze Consumers’ Behavior
Anti-fraud API Service Insurance
Campaign And Conversion Analysis
Credit Card Application Approval
Customer service chatbots/routing
Claim Fraud Detection
Evaluate Create Worthiness
Fraud And Credit Risk Analysis
Fraud Detection
Hedge Fund Management
Risk evaluation
59
Financial investment forecasting involves processing
vast amounts of data to derive predictions that can
outperform the market. SpaceKnow is using GPU to
extract global macro and micro-economic activity
that helps build high-performance portfolios
AI-POWERED
INVESTMENT
ALPHA
60
EQUIFAX
Equifax now has NeuroDecision Technology
“NeuroDecision Technology (NDT) is the first regulatory-compliant machine learning credit
scoring system reviewed by regulators and credit scoring experts. This technology develops a
neural network model that improves performance and accuracy, which gives customers the
ability to make more informed business decisions when assessing risk.”
https://investor.equifax.com/news-and-events/news/2018/03-26-2018-143044126
“The executive noted that the neural net has improved its ability to make predictive models
by as much as 15 percent.”
https://www.pymnts.com/innovation/2017/equifax-uses-deep-neural-machine-learning-to-improve-credit-scoring/
Neural nets can be 15% better for prediction
61
FRAUD DETECTION
Incumbent firms looking at deep structured history of customers to do supervised learning on
fraud / no fraud inline with transactions; some use of autoencoders and other networks to do
latent space clustering to identify fraud after the fact.
Mixed use of raw transactions over time (RNN) and transaction summary vectors (RNN and
CNN) to train models. CNN or fully connected has advantages in-line w.r.t. transcation
latency.
DL shown to be able to dramatically reduce false positives in transactional fraud!
Also use cases around Speech to Text transcription for insider trading etc. Commercial
applications challenged today by industry jargon, accents, non-English or Mandarin but these
challenges can be overcome.
Discussion
62
CREDIT SCORING
Likelihood of default.
Incumbent firms mostly trying to mine their existing data to more accurately predict
repayment/prepayment/default behavior. Can leverage DL to find structure that ML models
can then be retrained to exploit in an explainable manner.
Challenger / startups looking to use DL to combine multiple data sources to develop models
that produce either scores correlated with existing scores or their own “behavioral score”.
Frequent challenges around data bias introduced by past credit criteria and practices!
Explainability is a primary concern of many firms, especially under GDPR. Techniques
including LIME, latent space clustering, nearest training set example, and other emerging
network visualization research is being followed closely.
Discussion
63
STAC A2™ BENCHMARK
STAC A2 Benchmark
Developed by banks
Macro and micro, performance and accuracy
Pricing and Greeks for American exercise basket option, correlated
Heston dynamics, Longstaff Schwartz Monte Carlo
Independently Audited Results
GPU Solution
The first system to handle the baseline problem size in "real time"
(less than one second)
Please see http://www.stacresearch.com/a2 for more details of the STAC Benchmark
Also see https://devblogs.nvidia.com/parallelforall/american-option-pricing-monte-carlo-simulation/ for
more details on Longstaff-Schwartz Monte Carlo on GPUs
64
DEEP LEARNING FOR
CUSTOMER SERVICE
OPERATIONS
AI-assisted and fully autonomous
customer interactions.
Integrates with leading customer
service software and
communications channels
Human AgentsHuman Customer
Agent
Console
Deep Neural
Networks
AI Model is trained on historical
chat logs and customer service
transcripts.
DigitalGenius
TensorFlow and Pytorch
Supervised
Unsupervised
65
GPU-ACCELERATED BERT
State-of-the Art Natural Language Processing
BERT
AVAILABLE ON NGC
SUPER-HUMAN QUESTION ANSWERING 280X FASTER TRAINING
REAL-TIME INFERENCE
Question Answering
Translation
Dialog
Sentiment Analysis
Summarizing
86.6
89.3
90.1
91.8
85
87
89
91
93
RM
Reader
BERT
CPU Server
DGX-2
CPU Server
T4
Training
52 Hours
13 Mins
230 ms
18 ms
Inference
QANet nlnet
Human
Source: Question answering accuracy on SQUAD 1.1 for non-ensemble models
66
GPU ACCELERATED MACHINE LEARNING
FOR BOND PRICE PREDICTION
100 data elements per trade:
Trade size / historical
Coupon rate / time to maturity
Bond rating
Trade type buy/sell
Reporting delays
Current yield / yield to maturity
Launch as many CUDA threads as there are
data elements leverage 5120 Cores on V100
to run multiple Kernels in parallel
http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-
gtc.php?searchByKeyword=s8655&searchItems=session_id&sessionTopic=&sessionEvent=&sessionYear=&sessionFormat=&submit=&select=
NEARLY 10X SPEED UP
OVER CPU IMPLEMENTATION
Bond trading price
20 21 22 23 24 25
p
SpeedupoverCPU
0
2
4
6
8
10
Unoptimized CUDA
Optimized CUDA
67
APPLYING DEEP LEARNING TO
FINANCIAL MARKETS WITH NEWS DATA
Recording: http://on-demand.gputechconf.com/gtc/2017/video/s7696-andrew-tan-applying-deep-learning-to-financial-market-signal-identification-with-news-
data.mp4
PDF: http://on-demand.gputechconf.com/gtc/2017/presentation/s7696_Andrew-Tan_ FinancialMarketSignalIdentification.pdf
68
GPU ACCELERATED COMPUTING
HPC AI ML VISUALIZATION
CLARA NVIDIA AI RAPIDS ISAAC DRIVE METROPOLIS
WORKSTATIONS SERVERS CLOUD
CUDA & GPU COMPUTING ARCHITECTURE
NVIDIA GPU CLOUD
TESLA DGXTEGRA
69
APPS &
FRAMEWORKS
CUDA-X
NVIDIA LIBRARIES
NVIDIA DATA CENTER PLATFORM
Single Platform Drives Utilization and Productivity
VIRTUAL GPU
CUDA & CORE LIBRARIES - cuBLAS | NCCL
DEEP LEARNING
cuDNN
HPC
cuFFTOpenACC
+550
Applications
Amber
NAMD
CUSTOMER
USE CASES
VIRTUAL GRAPHICS
Speech Translate Recommender
SCIENTIFIC APPLICATIONS
Molecular
Simulations
Weather
Forecasting
Seismic
Mapping
CONSUMER INTERNET & INDUSTRY APPLICATIONS
ManufacturingHealthcare Finance
GPUs
& SYSTEMS
SYSTEM OEM CLOUDTESLA GPU NVIDIA HGXNVIDIA DGX FAMILY
MACHINE LEARNING
cuMLcuDF cuGRAPH cuDNN CUTLASS TensorRTvDWS vPC
Creative &
Technical
Knowledge
Workers
vAPPS
+600
Applications
DX/OGL
70
TRADITIONAL
DATA SCIENCE
CLUSTER
Workload Profile:
• 192GB mortgage data set
• 16 years, 68 quarters
• 34.7 Million single family mortgage loans
• 1.85 Billion performance records
• XGBoost training set: 50 features
300 Servers | $3M | 180 kW
71
GPU-ACCELERATED
MACHINE
LEARNING
CLUSTER
1 DGX-2 | 10 kW
1/8 the Cost | 1/15 the Space
1/18 the Power
DGX-2 and RAPIDS for
Predictive Analytics
0 2,000 4,000 6,000 8,000 10,000
20 CPU Nodes
30 CPU Nodes
50 CPU Nodes
100 CPU Nodes
DGX-2
5x DGX-1
End-to-End
2 PFLOPS | 512GB HBM2 | 10 kW | 350 lbs
NVIDIA DGX-2
73©2018 VMware, Inc.
OPTIMIZED
SOFTWARE
FASTER
DEPLOYMENTS
Eliminates installations.
Simply Pull & Run the
app
Key DL frameworks
updated monthly for perf
optimization
Empowers users to
deploy the latest versions
with IT support
Better Insights and faster
time-to-solution
NGC – SIMPLIFYING AI & HPC WORKFLOWS
ZERO
MAINTENANCE
HIGHER
PRODUCTIVITY
EMBEDDING
EXPERTISE
Deliver greater value,
faster
74
75
GET STARTED WITH NGC
Deploy containers:
ngc.nvidia.com
Learn more about NGC offering:
nvidia.com/ngc
Technical information:
developer.nvidia.com
Explore the NGC Registry for DL, ML & HPC
76
6 QUESTIONS FACING EVERY AI ENTERPRISE
Top Challenges for AI, Big Data, and Enterprise Transformation
Is your data doubling each year?
DATA DELUGE
Are you an intelligent enterprise needing
real time predictive analytics?
DELAYED INTELLIGENCE
Is your CAPEX budget shrinking amidst
escalating infrastructure demand?
SHRINKING BUDGET
Is ML training prohibitively long, delaying
time-to-predictions?
PROLONGED TRAINING TIME
Is Spark workloads creating relentless
infrastructure sprawl?
COMPLEX WORKLOADS
$Do you have oceans of data, that take
lifetimes to wrangle?
TEDIOUS DATA PREP
RAPIDS
RAPIDS
GPU Accelerated End-to-End Data Science
RAPIDS is a set of open source libraries for GPU accelerating
data preparation and machine learning.
OSS website: rapids.ai
GPU Memory
Data Preparation VisualizationModel Training
cuGraph
Graph Analytics
cuML
Machine Learning
cuDF
Data Preparation
78
DATA SCIENCE WORKFLOW WITH RAPIDS
Open Source, End-to-end GPU-accelerated Workflow Built On CUDA
DATA
DATA PREPARATION
GPUs accelerated compute for in-memory data preparation
Simplified implementation using familiar data science tools
Python drop-in Pandas replacement built on CUDA C++. GPU-accelerated Spark (in development)
PREDICTIONS
79
DATA SCIENCE WORKFLOW WITH RAPIDS
Open Source, End-to-end GPU-accelerated Workflow Built On CUDA
MODEL TRAINING
GPU-acceleration of today’s most popular ML algorithms
XGBoost, PCA, Kalman, K-means, k-NN, DBScan, tSVD …
DATA PREDICTIONS
80
DATA SCIENCE WORKFLOW WITH RAPIDS
Open Source, End-to-end GPU-accelerated Workflow Built On CUDA
VISUALIZATION
Effortless exploration of datasets, billions of records in milliseconds
Dynamic interaction with data = faster ML model development
Data visualization ecosystem (Graphistry & OmniSci), integrated with RAPIDS
DATA PREDICTIONS
81
www.RAPIDS.ai
82
PILLARS OF RAPIDS PERFORMANCE
CUDA Architecture NVLink/NVSwitch Integrated Software
Massively Parallel Processing High Speed Connecting between
GPUs for Distributed Algorithms
Fully Integrated Software and
Hardware for Instant Productivity
NVSwitch
6x
NVLink
CUDA
PYTHON
APACHE ARROW on GPU Memory
DASK
cuDNN
RAPIDS
cuMLcuDF
DL
FRAMEWORKS
83
NEW TURING TENSOR CORE
MULTI-PRECISION FOR AI INFERENCE & SCALE-OUT TRAINING
65 TFLOPS FP16 | 130 TeraOPS INT8 | 260 TeraOPS INT4
84
TENSOR CORE AUTOMATIC MIXED PRECISION
Over 3x Speedup With Just Two Lines of Code
TOOLS AND LIBRARIES
MAINTAIN NETWORK ACCURACY
TRAINING SPEEDUP OVER 3X INFERENCE SPEEDUP OVER 4X
0
20000
40000
60000
80000
100000
PyTorch
GNMT
TotalTokens/sec
FP32 Mixed
3.4X
1xV100
0
2000
4000
6000
8000
TensorRT
ResNet50
Images/sec
FP32 INT8 Mixed
4.4X
7ms Latency
1xV100
Tensor Core
Journey Page Github
Profiler Tools
85
Circa 2000 - Torch7 - 4th (using odd numbers only 1,3,5,7)
Web-scale learning in speech, image and video applications
Maintained by top researchers including
Soumith Chintala - Research Engineer @ Facebook
All the goodness of Torch7 with an intuitive Python frontend that focuses on rapid
prototyping, readable code & support for a wide variety of deep learning models.
https://pytorch.org/2018/05/02/road-to-1.0.html
86
Apex - A PyTorch Extension
● Goal: Raise PyTorch customer awareness and increase adoption of NVIDIA Tensor Cores
● Content: Provide an easy to use set of utility functions in PyTorch for mixed-precision optimizations
● Benefit: Few lines of code to achieve improved training speed while maintaining accuracy and
stability of single precision (Tensor Cores)
● Target audience: Deep learning researchers and developers of PyTorch with NVIDIA Volta
● Key Features: AMP (Auditor for mixed-precision) and Optimizer Wrapper (Dynamic loss scaling and
master parameters)
● Teams Involved: Leading NVIDIA PyTorch team and collaboration with external FB PyTorch team
Overview
87
EVEN MORE SOFTWARE
88
DALI
Full input pipeline acceleration including
data loading and augmentation
Drop-in integration with direct plugins to
frameworks – MxNet, TensorFlow, PyTorch
Portable workflows through multiple input
formats and configurable graphs
Unblock CPU with GPU-accelerated DL pre-processing library
Version 0.1 supports:
• Resnet-50 image classification & segmentation training
• Input formats – JPEG, LMDB, RecordIO, TFRecord
• Python APIs to define, build and run an input pipelineAvailable as open source -
https://github.com/NVIDIA/DALI
• DALI Samples & Tutorial:
https://github.com/NVIDIA/DALI/blob/master/examples/Getting%20started.ipynb
• nvJPEG (Webpage, Documentation) - https://developer.nvidia.com/nvjpeg
89
AI INFERENCE NEEDS TO RUN EVERYWHERE
Training InferencingDNN Model
90
T4: UNIVERSAL INFERENCE ACCELERATOR
91
GTC-Pre Announce during keynote
Customer sign-up page will go live, post-keynote
ANNOUNCING NVIDIA T4 ON
AMAZON AWS
92
92
93
Learn more here:
https://nvidia.com/data-center-inference
https://docs.nvidia.com/deeplearning/sdk/inference-release-notes/index.html
Get the ready-to-deploy container with monthly updates
from the NGC container registry:
https://ngc.nvidia.com/catalog/containers/nvidia%2Ftensorrtserver
Open source GitHub repository:
https://github.com/NVIDIA/tensorrt-inference-server
LEARN MORE AND DOWNLOAD TO USE
94d e v e l o p e r . n v i d i a . c o m
Fundamentals
Accelerated Computing
Game Development &
Digital Content
Finance
NVIDIA DEEP LEARNING
INSTITUTE
Online self-paced labs and instructor-led
workshops on deep learning and
accelerated computing
Take self-paced labs at
www.nvidia.co.uk/dlilabs
View upcoming workshops and request a
workshop onsite at www.nvidia.co.uk/dli
Educators can join the University
Ambassador Program to teach DLI courses
on campus and access resources. Learn
more at www.nvidia.com/dli
Intelligent Video
Analytics
Healthcare
Robotics
Autonomous Vehicles
Virtual Reality
96
NVIDIA
INCEPTION
PROGRAM
Accelerates AI startups with a boost of
GPU tools, tech and deep learning expertise
Startup Qualifications
Driving advances in the field of AI
Business plan
Incorporated
Web presence
Technology
DL startup kit*
Pascal Titan X
Deep Learning Institute (DLI) credit
Connect with a DL tech expert
DGX-1 ISV discount*
Software release notification
Live webinar and office hours
*By application
Marketing
Inclusion in NVIDIA marketing efforts
GPU Technology Conference (GTC)
discount
Emerging Company Summit (ECS)
participation+
Marketing kit
One-page story template
eBook template
Inception web badge and banners
Social promotion request form
Event opportunities list
Promotion at industry events
GPU ventures+
+By invitation
www.nvidia.com/inception
97
alowndes@nvidia.com

AI in the Financial Services Industry

  • 1.
    1 ALISON B LOWNDES AIDevRel | EMEA @alisonblowndes March 2019
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
    6 The day job AUTOMOTIVE Autosensors reporting location, problems COMMUNICATIONS Location-based advertising CONSUMER PACKAGED GOODS Sentiment analysis of what’s hot, problems $ FINANCIAL SERVICES Risk & portfolio analysis New products EDUCATION & RESEARCH Experiment sensor analysis HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg. quality Warranty analysis LIFE SCIENCES MEDIA/ENTERTAINMENT Viewers / advertising effectiveness ON-LINE SERVICES / SOCIAL MEDIA People & career matching HEALTH CARE Patient sensors, monitoring, EHRs OIL & GAS Drilling exploration sensor analysis RETAIL Consumer sentiment TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows UTILITIES Smart Meter analysis for network capacity, LAW ENFORCEMENT & DEFENSE Threat analysis - social media monitoring, photo analysis
  • 7.
  • 8.
    8 NVIDIA CUDA-X AIECOSYSTEM FRAMEWORKS CLOUD DEPLOYMENT Workstation CloudServer DA GRAPH DL TRAINML DL INFERENCE Amazon SageMaker Serving Amazon SageMaker Neo Google Cloud ML CUDA-X AI CUDA Azure Machine Learning
  • 9.
    9 CUDA X ECOSYSTEM PRogrammableAcceleration across multiple Domains with one Architecture (PRADA) Specialized PerformanceEase of use FrameworksApplications Libraries Directives and Standard Languages Extended Standard Languages CUDA-C++ CUDA Fortran GPU Users Domain Specialists Problem Specialists New Algorithm Developers and Optimization Experts
  • 10.
    10 THE NEW NGC GPU-optimizedSoftware Hub. Simplifying DL, ML and HPC Workflows NGC 50+ Containers DL, ML, HPC 50+ Pre-trained Models NLP, Classification, Object Detection & more Industry Workflows Medical Imaging, Intelligent Video Analytics 10+ Model Training Scripts NLP, Image Classification, Object Detection & more Innovate Faster Deploy Anywhere Simplify Deployments
  • 11.
    11 TESLA V100 TENSOR COREGPU World’s Most Advanced Data Center GPU 5,120 CUDA cores 640 NEW Tensor cores 7.8 FP64 TFLOPS | 15.7 FP32 TFLOPS 125 Tensor TFLOPS 32 GB HBM2 @ 900GB/s | 300GB/s NVLink
  • 12.
    12 THE PATH FORWARD CPU+ Accelerator Simulation + AIFull-stack Optimization FP64 + Multi-Precision + 5.3 7.810.6 15.7 21.2 125 P100 V100
  • 13.
  • 14.
    14 PRICING DERIVATIVES COMPARING CPUTO GPU European Options SPX Monte Carlo Simulation Louis Scott — March 2018 American Options SPY Finite Difference Method Finite Difference Method Hardware Compute Time Speed Up Factor Seconds Days to Option Expiration 105 CPU - i7 5630 146 GPU- V100 0.268 544 x faster Monte Carlo 2m simulations 4 options Hardware Compute Time Speed Up Factor Seconds Days to Option Expiration 105 CPU - i7 5630 47 GPU- V100 0.18 267 x faster http://on-demand.gputechconf.com/gtc/2018/video/S8123/
  • 15.
  • 16.
  • 17.
  • 18.
    18 12 6 39 GPU POWERED WORKFLOW DAY IN THELIFE OF A DATA SCIENTIST Train Model Validate Test Model Experiment with Optimizations and Repeat Go Home on Time Dataset Downloads Overnight Start GET A COFFEE Stay Late Restart Data Prep Workflow Again Find Unexpected Null Values Stored as String… Switch to Decaf 12 6 39 CPU POWERED WORKFLOW Restart Data Prep Workflow @*#! Forgot to Add a Feature ANOTHER… GET A COFFEE Start Data Prep Workflow GET A COFFEE Configure Data Prep Workflow Dataset Downloads Overnight Dataset Collection Analysis Data Prep Train Inference
  • 19.
    NATURAL LANGUAGE PROCESSING FORSIGNAL GENERATION ON NEWS DATA
  • 20.
    WORD EMBEDDINGS GLoVe –Global Vectors for Word Representation, utilizes the word-to-word co- occurrence statistics from a given corpus.
  • 22.
    22 Algorithmic Trading usingDeep Autoencoder based Statistical Arbitrage NVIDIA Deep Learning Institute
  • 23.
    23 Moving Average –Mean Reversion
  • 24.
  • 25.
  • 26.
    26 Snorkel https://github.com/HazyResearch/snorkel Automation of labellingdata A system for rapidly creating, modeling, and managing training data, For domains in which large labelled training sets are not available or easy to obtain. Learning, essentially, which labelling functions are more accurate than others—and then using this to train a DNN A general framework for many weak supervision techniques.
  • 27.
    27 AUTO ML: AICREATING AI http://automl.chalearn.org/
  • 28.
  • 29.
  • 30.
    30 WHAT IS ATENSOR? And why do they flow?
  • 31.
    31 WHAT IS ATENSOR? And why do they flow? Scalar is a list of numbers with 0 indices (length 1) a Vector is a list of numbers with 1 index of length k a[k] Matrix is a list of numbers with 2 indices of length r,c a[r,c] A Tensor is a list of numbers with n indices of length n1, n2, …, nm a[n1..nm] an n-dimensional array
  • 32.
    32 Origin of NeuralNetworks Input Output
  • 33.
    33 A Simple Neuron InputOutput Neuron x1 w2x2 y w1x1 x2
  • 34.
    34 Neural Network Basics 2ndstep: sum 1st step: Activations * Weights 3rd step: activate
  • 35.
    35 Combining Neurons x1 x2 x3 x4 x5 —Additional neuronscan be added to create a layer —Multiple layers can also be added, resulting in input, hidden, and output layers —Expanding the neural network size creates additional predictive power —In feed forward neural networks, neurons are fully connected to surrounding layers y Input Layer Output Layer Hidden Layers
  • 36.
    36 Deep Neural Networks(DNNs) x1 x2 x3 x4 x5 Input Layer Output LayerMany Hidden Layers y
  • 37.
    37 Image “Volvo XC90” Imagesource: “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks” ICML 2009 & Comm. ACM 2011. Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng. CONVOLUTIONAL NEURAL NETWORKS
  • 38.
    38Yann Le Cun(FaceBook) ISSCC, Feb 2019
  • 39.
  • 40.
  • 41.
    41 Li et al,University of Maryland & US Naval Academy https://arxiv.org/pdf/1712.09913.pdf
  • 42.
    42 AUTOENCODERS UNSUPERVISED feature learning Sparse Representation Training Data Reconstruction EncoderDecoder Minimize Reconstruction Error - InputLayer HiddenLayer HiddenLayer HiddenLayer BottleneckLayer HiddenLayer HiddenLayer HiddenLayer OutputLayer
  • 43.
    43 Long short-term memory(LSTM) Hochreiter (1991) analysed vanishing gradient “LSTM falls out of this almost naturally” Gates control importance of the corresponding activations Training via backprop unfolded in time LSTM: input gate output gate Long time dependencies are preserved until input gate is closed (-) and forget gate is open (O) forget gate Fig from Vinyals et al, Google April 2015 NIC Generator Fig from Graves, Schmidhuber et al, Supervised Sequence Labelling with RNNs
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
    48 Tabish Rashid etal, Oxford/DeepMind QMIX: https://arxiv.org/pdf/1803.11485.pdf
  • 49.
    49 DEEPMIND ALPHA* 1.0=> 2.0 => 0.0 * ..a deeply structured hybrid (Gary Marcus, Jan 2018)
  • 50.
  • 51.
  • 52.
  • 53.
    53 COMPUTATIONAL SCALE REQUIRED 3million labeled images 1 DGX-1 trains 300k labeled images on 1 DNN in 1 day 10 DNNs required for self-driving 10 parallel experiments at all times 100 DGX-1 per car
  • 54.
    54 INNOVATIONS & KEYMETRICS DISTILLED NVIDIA SATURNV REFERENCE Design Best Practices Lessons learned from building the worlds largest AI Infrastructure Reference Architectures Partner Reference Architectures, Scale up racks Product Innovation & Quality Rapid exploration & resolution of customer issues
  • 55.
  • 56.
    Illuminate Deep Networks [email protected] Original Image Most Certain Uncertain Layer273: Visualizing expected vs actual outputs Uncertain Inputs of two filters which are most uncertain
  • 57.
  • 58.
    58 DATA SCIENCE IN FINANCE AlphaStock Identification Analyze Consumers’ Behavior Anti-fraud API Service Insurance Campaign And Conversion Analysis Credit Card Application Approval Customer service chatbots/routing Claim Fraud Detection Evaluate Create Worthiness Fraud And Credit Risk Analysis Fraud Detection Hedge Fund Management Risk evaluation
  • 59.
    59 Financial investment forecastinginvolves processing vast amounts of data to derive predictions that can outperform the market. SpaceKnow is using GPU to extract global macro and micro-economic activity that helps build high-performance portfolios AI-POWERED INVESTMENT ALPHA
  • 60.
    60 EQUIFAX Equifax now hasNeuroDecision Technology “NeuroDecision Technology (NDT) is the first regulatory-compliant machine learning credit scoring system reviewed by regulators and credit scoring experts. This technology develops a neural network model that improves performance and accuracy, which gives customers the ability to make more informed business decisions when assessing risk.” https://investor.equifax.com/news-and-events/news/2018/03-26-2018-143044126 “The executive noted that the neural net has improved its ability to make predictive models by as much as 15 percent.” https://www.pymnts.com/innovation/2017/equifax-uses-deep-neural-machine-learning-to-improve-credit-scoring/ Neural nets can be 15% better for prediction
  • 61.
    61 FRAUD DETECTION Incumbent firmslooking at deep structured history of customers to do supervised learning on fraud / no fraud inline with transactions; some use of autoencoders and other networks to do latent space clustering to identify fraud after the fact. Mixed use of raw transactions over time (RNN) and transaction summary vectors (RNN and CNN) to train models. CNN or fully connected has advantages in-line w.r.t. transcation latency. DL shown to be able to dramatically reduce false positives in transactional fraud! Also use cases around Speech to Text transcription for insider trading etc. Commercial applications challenged today by industry jargon, accents, non-English or Mandarin but these challenges can be overcome. Discussion
  • 62.
    62 CREDIT SCORING Likelihood ofdefault. Incumbent firms mostly trying to mine their existing data to more accurately predict repayment/prepayment/default behavior. Can leverage DL to find structure that ML models can then be retrained to exploit in an explainable manner. Challenger / startups looking to use DL to combine multiple data sources to develop models that produce either scores correlated with existing scores or their own “behavioral score”. Frequent challenges around data bias introduced by past credit criteria and practices! Explainability is a primary concern of many firms, especially under GDPR. Techniques including LIME, latent space clustering, nearest training set example, and other emerging network visualization research is being followed closely. Discussion
  • 63.
    63 STAC A2™ BENCHMARK STACA2 Benchmark Developed by banks Macro and micro, performance and accuracy Pricing and Greeks for American exercise basket option, correlated Heston dynamics, Longstaff Schwartz Monte Carlo Independently Audited Results GPU Solution The first system to handle the baseline problem size in "real time" (less than one second) Please see http://www.stacresearch.com/a2 for more details of the STAC Benchmark Also see https://devblogs.nvidia.com/parallelforall/american-option-pricing-monte-carlo-simulation/ for more details on Longstaff-Schwartz Monte Carlo on GPUs
  • 64.
    64 DEEP LEARNING FOR CUSTOMERSERVICE OPERATIONS AI-assisted and fully autonomous customer interactions. Integrates with leading customer service software and communications channels Human AgentsHuman Customer Agent Console Deep Neural Networks AI Model is trained on historical chat logs and customer service transcripts. DigitalGenius TensorFlow and Pytorch Supervised Unsupervised
  • 65.
    65 GPU-ACCELERATED BERT State-of-the ArtNatural Language Processing BERT AVAILABLE ON NGC SUPER-HUMAN QUESTION ANSWERING 280X FASTER TRAINING REAL-TIME INFERENCE Question Answering Translation Dialog Sentiment Analysis Summarizing 86.6 89.3 90.1 91.8 85 87 89 91 93 RM Reader BERT CPU Server DGX-2 CPU Server T4 Training 52 Hours 13 Mins 230 ms 18 ms Inference QANet nlnet Human Source: Question answering accuracy on SQUAD 1.1 for non-ensemble models
  • 66.
    66 GPU ACCELERATED MACHINELEARNING FOR BOND PRICE PREDICTION 100 data elements per trade: Trade size / historical Coupon rate / time to maturity Bond rating Trade type buy/sell Reporting delays Current yield / yield to maturity Launch as many CUDA threads as there are data elements leverage 5120 Cores on V100 to run multiple Kernels in parallel http://on-demand-gtc.gputechconf.com/gtcnew/on-demand- gtc.php?searchByKeyword=s8655&searchItems=session_id&sessionTopic=&sessionEvent=&sessionYear=&sessionFormat=&submit=&select= NEARLY 10X SPEED UP OVER CPU IMPLEMENTATION Bond trading price 20 21 22 23 24 25 p SpeedupoverCPU 0 2 4 6 8 10 Unoptimized CUDA Optimized CUDA
  • 67.
    67 APPLYING DEEP LEARNINGTO FINANCIAL MARKETS WITH NEWS DATA Recording: http://on-demand.gputechconf.com/gtc/2017/video/s7696-andrew-tan-applying-deep-learning-to-financial-market-signal-identification-with-news- data.mp4 PDF: http://on-demand.gputechconf.com/gtc/2017/presentation/s7696_Andrew-Tan_ FinancialMarketSignalIdentification.pdf
  • 68.
    68 GPU ACCELERATED COMPUTING HPCAI ML VISUALIZATION CLARA NVIDIA AI RAPIDS ISAAC DRIVE METROPOLIS WORKSTATIONS SERVERS CLOUD CUDA & GPU COMPUTING ARCHITECTURE NVIDIA GPU CLOUD TESLA DGXTEGRA
  • 69.
    69 APPS & FRAMEWORKS CUDA-X NVIDIA LIBRARIES NVIDIADATA CENTER PLATFORM Single Platform Drives Utilization and Productivity VIRTUAL GPU CUDA & CORE LIBRARIES - cuBLAS | NCCL DEEP LEARNING cuDNN HPC cuFFTOpenACC +550 Applications Amber NAMD CUSTOMER USE CASES VIRTUAL GRAPHICS Speech Translate Recommender SCIENTIFIC APPLICATIONS Molecular Simulations Weather Forecasting Seismic Mapping CONSUMER INTERNET & INDUSTRY APPLICATIONS ManufacturingHealthcare Finance GPUs & SYSTEMS SYSTEM OEM CLOUDTESLA GPU NVIDIA HGXNVIDIA DGX FAMILY MACHINE LEARNING cuMLcuDF cuGRAPH cuDNN CUTLASS TensorRTvDWS vPC Creative & Technical Knowledge Workers vAPPS +600 Applications DX/OGL
  • 70.
    70 TRADITIONAL DATA SCIENCE CLUSTER Workload Profile: •192GB mortgage data set • 16 years, 68 quarters • 34.7 Million single family mortgage loans • 1.85 Billion performance records • XGBoost training set: 50 features 300 Servers | $3M | 180 kW
  • 71.
    71 GPU-ACCELERATED MACHINE LEARNING CLUSTER 1 DGX-2 |10 kW 1/8 the Cost | 1/15 the Space 1/18 the Power DGX-2 and RAPIDS for Predictive Analytics 0 2,000 4,000 6,000 8,000 10,000 20 CPU Nodes 30 CPU Nodes 50 CPU Nodes 100 CPU Nodes DGX-2 5x DGX-1 End-to-End
  • 72.
    2 PFLOPS |512GB HBM2 | 10 kW | 350 lbs NVIDIA DGX-2
  • 73.
    73©2018 VMware, Inc. OPTIMIZED SOFTWARE FASTER DEPLOYMENTS Eliminatesinstallations. Simply Pull & Run the app Key DL frameworks updated monthly for perf optimization Empowers users to deploy the latest versions with IT support Better Insights and faster time-to-solution NGC – SIMPLIFYING AI & HPC WORKFLOWS ZERO MAINTENANCE HIGHER PRODUCTIVITY EMBEDDING EXPERTISE Deliver greater value, faster
  • 74.
  • 75.
    75 GET STARTED WITHNGC Deploy containers: ngc.nvidia.com Learn more about NGC offering: nvidia.com/ngc Technical information: developer.nvidia.com Explore the NGC Registry for DL, ML & HPC
  • 76.
    76 6 QUESTIONS FACINGEVERY AI ENTERPRISE Top Challenges for AI, Big Data, and Enterprise Transformation Is your data doubling each year? DATA DELUGE Are you an intelligent enterprise needing real time predictive analytics? DELAYED INTELLIGENCE Is your CAPEX budget shrinking amidst escalating infrastructure demand? SHRINKING BUDGET Is ML training prohibitively long, delaying time-to-predictions? PROLONGED TRAINING TIME Is Spark workloads creating relentless infrastructure sprawl? COMPLEX WORKLOADS $Do you have oceans of data, that take lifetimes to wrangle? TEDIOUS DATA PREP
  • 77.
    RAPIDS RAPIDS GPU Accelerated End-to-EndData Science RAPIDS is a set of open source libraries for GPU accelerating data preparation and machine learning. OSS website: rapids.ai GPU Memory Data Preparation VisualizationModel Training cuGraph Graph Analytics cuML Machine Learning cuDF Data Preparation
  • 78.
    78 DATA SCIENCE WORKFLOWWITH RAPIDS Open Source, End-to-end GPU-accelerated Workflow Built On CUDA DATA DATA PREPARATION GPUs accelerated compute for in-memory data preparation Simplified implementation using familiar data science tools Python drop-in Pandas replacement built on CUDA C++. GPU-accelerated Spark (in development) PREDICTIONS
  • 79.
    79 DATA SCIENCE WORKFLOWWITH RAPIDS Open Source, End-to-end GPU-accelerated Workflow Built On CUDA MODEL TRAINING GPU-acceleration of today’s most popular ML algorithms XGBoost, PCA, Kalman, K-means, k-NN, DBScan, tSVD … DATA PREDICTIONS
  • 80.
    80 DATA SCIENCE WORKFLOWWITH RAPIDS Open Source, End-to-end GPU-accelerated Workflow Built On CUDA VISUALIZATION Effortless exploration of datasets, billions of records in milliseconds Dynamic interaction with data = faster ML model development Data visualization ecosystem (Graphistry & OmniSci), integrated with RAPIDS DATA PREDICTIONS
  • 81.
  • 82.
    82 PILLARS OF RAPIDSPERFORMANCE CUDA Architecture NVLink/NVSwitch Integrated Software Massively Parallel Processing High Speed Connecting between GPUs for Distributed Algorithms Fully Integrated Software and Hardware for Instant Productivity NVSwitch 6x NVLink CUDA PYTHON APACHE ARROW on GPU Memory DASK cuDNN RAPIDS cuMLcuDF DL FRAMEWORKS
  • 83.
    83 NEW TURING TENSORCORE MULTI-PRECISION FOR AI INFERENCE & SCALE-OUT TRAINING 65 TFLOPS FP16 | 130 TeraOPS INT8 | 260 TeraOPS INT4
  • 84.
    84 TENSOR CORE AUTOMATICMIXED PRECISION Over 3x Speedup With Just Two Lines of Code TOOLS AND LIBRARIES MAINTAIN NETWORK ACCURACY TRAINING SPEEDUP OVER 3X INFERENCE SPEEDUP OVER 4X 0 20000 40000 60000 80000 100000 PyTorch GNMT TotalTokens/sec FP32 Mixed 3.4X 1xV100 0 2000 4000 6000 8000 TensorRT ResNet50 Images/sec FP32 INT8 Mixed 4.4X 7ms Latency 1xV100 Tensor Core Journey Page Github Profiler Tools
  • 85.
    85 Circa 2000 -Torch7 - 4th (using odd numbers only 1,3,5,7) Web-scale learning in speech, image and video applications Maintained by top researchers including Soumith Chintala - Research Engineer @ Facebook All the goodness of Torch7 with an intuitive Python frontend that focuses on rapid prototyping, readable code & support for a wide variety of deep learning models. https://pytorch.org/2018/05/02/road-to-1.0.html
  • 86.
    86 Apex - APyTorch Extension ● Goal: Raise PyTorch customer awareness and increase adoption of NVIDIA Tensor Cores ● Content: Provide an easy to use set of utility functions in PyTorch for mixed-precision optimizations ● Benefit: Few lines of code to achieve improved training speed while maintaining accuracy and stability of single precision (Tensor Cores) ● Target audience: Deep learning researchers and developers of PyTorch with NVIDIA Volta ● Key Features: AMP (Auditor for mixed-precision) and Optimizer Wrapper (Dynamic loss scaling and master parameters) ● Teams Involved: Leading NVIDIA PyTorch team and collaboration with external FB PyTorch team Overview
  • 87.
  • 88.
    88 DALI Full input pipelineacceleration including data loading and augmentation Drop-in integration with direct plugins to frameworks – MxNet, TensorFlow, PyTorch Portable workflows through multiple input formats and configurable graphs Unblock CPU with GPU-accelerated DL pre-processing library Version 0.1 supports: • Resnet-50 image classification & segmentation training • Input formats – JPEG, LMDB, RecordIO, TFRecord • Python APIs to define, build and run an input pipelineAvailable as open source - https://github.com/NVIDIA/DALI • DALI Samples & Tutorial: https://github.com/NVIDIA/DALI/blob/master/examples/Getting%20started.ipynb • nvJPEG (Webpage, Documentation) - https://developer.nvidia.com/nvjpeg
  • 89.
    89 AI INFERENCE NEEDSTO RUN EVERYWHERE Training InferencingDNN Model
  • 90.
  • 91.
    91 GTC-Pre Announce duringkeynote Customer sign-up page will go live, post-keynote ANNOUNCING NVIDIA T4 ON AMAZON AWS
  • 92.
  • 93.
    93 Learn more here: https://nvidia.com/data-center-inference https://docs.nvidia.com/deeplearning/sdk/inference-release-notes/index.html Getthe ready-to-deploy container with monthly updates from the NGC container registry: https://ngc.nvidia.com/catalog/containers/nvidia%2Ftensorrtserver Open source GitHub repository: https://github.com/NVIDIA/tensorrt-inference-server LEARN MORE AND DOWNLOAD TO USE
  • 94.
    94d e ve l o p e r . n v i d i a . c o m
  • 95.
    Fundamentals Accelerated Computing Game Development& Digital Content Finance NVIDIA DEEP LEARNING INSTITUTE Online self-paced labs and instructor-led workshops on deep learning and accelerated computing Take self-paced labs at www.nvidia.co.uk/dlilabs View upcoming workshops and request a workshop onsite at www.nvidia.co.uk/dli Educators can join the University Ambassador Program to teach DLI courses on campus and access resources. Learn more at www.nvidia.com/dli Intelligent Video Analytics Healthcare Robotics Autonomous Vehicles Virtual Reality
  • 96.
    96 NVIDIA INCEPTION PROGRAM Accelerates AI startupswith a boost of GPU tools, tech and deep learning expertise Startup Qualifications Driving advances in the field of AI Business plan Incorporated Web presence Technology DL startup kit* Pascal Titan X Deep Learning Institute (DLI) credit Connect with a DL tech expert DGX-1 ISV discount* Software release notification Live webinar and office hours *By application Marketing Inclusion in NVIDIA marketing efforts GPU Technology Conference (GTC) discount Emerging Company Summit (ECS) participation+ Marketing kit One-page story template eBook template Inception web badge and banners Social promotion request form Event opportunities list Promotion at industry events GPU ventures+ +By invitation www.nvidia.com/inception
  • 97.