Deep Learning at NMC
Devin Jones
Director Machine Learning Lab, Nielsen
Devin Jones
● Machine Learning & Statistics
○ Research
■ Classification
■ Inference
■ Time Series
○ Application
■ Large scale
■ Streaming
Introduction
● Columbia University
○ CS/ML
● Rutgers University
○ Statistics
○ Econ
○ Operations
Research
● Ad Tech (7 years)
ML at NMC
Intro to Deep Learning
Deep Learning Research at NMC
Agenda
Machine Learning at NMC
“
”
Used to build larger audiences from smaller audience
segments to create reach for advertisers.
In theory, they reflect similar characteristics to a benchmark
set of characteristics the original audience segment
represents, such as in-market kitchen-appliance shoppers.
adage.com
The ML Challenge at NMC
Look Alike Modeling
The ML Challenge at NMC: An Example
Supervised Classification for Online Ad Targeting
ML at NMC
Data
Algorithm
Model
Supervised What?
Machine Learning has two main categories:
Supervised Learning
Unsupervised Learning
Supervised What?
Machine Learning has two main categories:
Supervised Learning: Inferences on Labeled Data
Unsupervised Learning: Inferences on Unlabeled Data
Supervised vs Unsupervised Learning
Supervised:
Spam or Ham?
Unsupervised:
Clustering Wikipedia
Articles
NMC high-level architecture
Machine Learning
The quality of data for a model will influence the model’s success
At NMC, we have access to high dimensional, sparse data:
The Feature Set & Scale
Models are trained in batches of 100,000 to 100,000,000 users depending on the purpose
~4,000 Segments ~200 Publishers User Agent
Geographic Info
(zip code)
+ + +
Resulting in over 100k features to choose from
To date, we have implemented these algorithms in our real time scoring engine:
We score billions of events per day using these models and our ML infrastructure
ML Algorithms at NMC
Binary Linear Model
kNN
Multinomial Linear Models
Online Learning for Linear
Models
Random Forest
And of course…
Deep Learning
Deep Learning at NMC
Topics
Motivation Intro to DL
NN
Architecture
GPU vs CPU
Motivation
2
1 Recent Success in Deep Learning
NMC data is similar to Natural Language Processing (NLP) data
Certain ad targeting problems can be framed as expressive,
hierarchical relationships
MOTIVATION
3
Deep Learning: Recent Success
▪ AlphaGo defeats all world top
professional Go players
▪ Image and Speech recognition
exceed human abilities
▪ AI in consumer products:
Amazon Echo
Google Home
Autonomous Driving
All of these recent AI breakthroughs are
based on Deep Neural Networks!
NMC Data & NLP Data
NLP data:
Observation: [‘This’, ‘is’, ‘a’,
‘tokenized’, ‘feature’, ‘vector’,
‘used’, ‘for’, ‘machine’, ‘learning’,
‘in’, ‘NLP’]
NMC data:
User: [ ‘segment: Likes Outdoors’,
‘segment: Male 25-35’, ‘location:
New York, NY’]
Deep Learning: Some definitions
Neural Network
Input
Hidden
Output
Neural Network
Input
Hidden
Output
Neural Network: Neuron
Neural Network: Neuron
Lives in NYC? = Yes
Orders from
Dominos?
Works in ad
tech?
0.5 =
0.01=
0.7 =
= 1.2= No
= Yes
A Deep Neural Network
Neural Network Phases
Training Inference
NMC high-level architecture
Machine Learning
NMC high-level architecture
Machine Learning (Training)
(Inference)
Definition Summary
● Training
● Inference
○ Matrix Multiplication
● Nodes
● Layers
● Network
DNN Architecture
DNN Architecture
Image Processing :: Convolutional Networks
Speech Recognition :: Recurrent Networks
AlphaGo :: Reinforcement Learning
An Architecture Example: Conv Nets
A Fully Connected DNN
A Residual Network
Figure 2. Convergence of neural network model with
forward shortcut (Residual Net)
Figure 1. Convergence of neural network model without
forward shortcut(regular net)
Residual Network Convergence
DNN Architecture
For Structured Data
Category Segment
City Prosperity
World-Class Health
Uptown Elite
Penthouse Chic
Metro High-Flyers
Prestige Positions
Premium Fortunes
Diamond Days
Alpha Families
Bank of Mum and Dad
Empty-Nest Adventure
Multi-level Hierarchical Classification
C1 C2 C3
S1 S3 S4 S5 S6S2
Multi-level Hierarchical Classification
C1 C2 C3
S1 S3 S4 S5 S6S2
Naive
Approach
Multi-level Hierarchical Classification
DNN For Multi-Level Hierarchical Classification
GPU vs. CPU
Batch Size & Processing Time
We are not batching matrix algebra operations
NMC Serving operates on 1 request at a time!!
GPU vs CPU
CPU Computational Improvements
Inference on a Layer ~ Matrix Multiplication
Input
Hidden
Sparse Matrix Multiplication
inference
improvement,
sub-millisecond
model
evaluation
32x
Trimming
WEAK CONNECTIONS
most connections in deep neural network are
very weak and can be removed
TRIMMING
LOW ACCURACY IMPACT
the trimming has very little impact on the
accuracy
COMPRESSED DATA
the trimming models can be described by
sparse matrices, and thus the data in models
are highly compressed
Neural Network Without Trimming
Neural Network Trimming
Model Model File
Size (MB)
Trimming
Threshold
Accuracy Scoring
Time (ms)
Not trimmed 108 0.0 13.29 10.0
Trimmed 2.7 0.001 13.30 0.22
Trimming: Space, Time & Performance
inference improvement, in CPU time and storage
50x
Key Takeaways
Architecture:
● Residual Networks saved the day
● Leverage expressive power of DNN for your data
Inference:
● You might not need a GPU for Deep Learning
● Improvements can be made on Sparse Matrix Algebra
libraries
● Use trimming
Thanks!

Deep learning at nmc devin jones