Deep learning at nmc devin jones

Deep Learning at NMC
Devin Jones
Director Machine Learning Lab, Nielsen

Devin Jones
● Machine Learning & Statistics
○ Research
■ Classification
■ Inference
■ Time Series
○ Application
■ Large scale
■ Streaming
Introduction
● Columbia University
○ CS/ML
● Rutgers University
○ Statistics
○ Econ
○ Operations
Research
● Ad Tech (7 years)

ML at NMC
Intro to Deep Learning
Deep Learning Research at NMC
Agenda

“
”
Used to build larger audiences from smaller audience
segments to create reach for advertisers.
In theory, they reflect similar characteristics to a benchmark
set of characteristics the original audience segment
represents, such as in-market kitchen-appliance shoppers.
adage.com
The ML Challenge at NMC
Look Alike Modeling

The ML Challenge at NMC: An Example

Supervised Classification for Online Ad Targeting
ML at NMC
Data
Algorithm
Model

Supervised What?
Machine Learning has two main categories:
Supervised Learning
Unsupervised Learning

Supervised What?
Machine Learning has two main categories:
Supervised Learning: Inferences on Labeled Data
Unsupervised Learning: Inferences on Unlabeled Data

Supervised vs Unsupervised Learning
Supervised:
Spam or Ham?
Unsupervised:
Clustering Wikipedia
Articles

NMC high-level architecture
Machine Learning

The quality of data for a model will influence the model’s success
At NMC, we have access to high dimensional, sparse data:
The Feature Set & Scale
Models are trained in batches of 100,000 to 100,000,000 users depending on the purpose
~4,000 Segments ~200 Publishers User Agent
Geographic Info
(zip code)
+ + +
Resulting in over 100k features to choose from

To date, we have implemented these algorithms in our real time scoring engine:
We score billions of events per day using these models and our ML infrastructure
ML Algorithms at NMC
Binary Linear Model
kNN
Multinomial Linear Models
Online Learning for Linear
Models
Random Forest
And of course…
Deep Learning

Topics
Motivation Intro to DL
NN
Architecture
GPU vs CPU

2
1 Recent Success in Deep Learning
NMC data is similar to Natural Language Processing (NLP) data
Certain ad targeting problems can be framed as expressive,
hierarchical relationships
MOTIVATION
3

Deep Learning: Recent Success
▪ AlphaGo defeats all world top
professional Go players
▪ Image and Speech recognition
exceed human abilities
▪ AI in consumer products:
Amazon Echo
Google Home
Autonomous Driving
All of these recent AI breakthroughs are
based on Deep Neural Networks!

NMC Data & NLP Data
NLP data:
Observation: [‘This’, ‘is’, ‘a’,
‘tokenized’, ‘feature’, ‘vector’,
‘used’, ‘for’, ‘machine’, ‘learning’,
‘in’, ‘NLP’]
NMC data:
User: [ ‘segment: Likes Outdoors’,
‘segment: Male 25-35’, ‘location:
New York, NY’]

Deep Learning: Some definitions

Neural Network
Input
Hidden
Output

Neural Network: Neuron
Lives in NYC? = Yes
Orders from
Dominos?
Works in ad
tech?
0.5 =
0.01=
0.7 =
= 1.2= No
= Yes

Neural Network Phases
Training Inference

NMC high-level architecture
Machine Learning (Training)
(Inference)

Definition Summary
● Training
● Inference
○ Matrix Multiplication
● Nodes
● Layers
● Network

DNN Architecture
Image Processing :: Convolutional Networks
Speech Recognition :: Recurrent Networks
AlphaGo :: Reinforcement Learning

An Architecture Example: Conv Nets

Figure 2. Convergence of neural network model with
forward shortcut (Residual Net)
Figure 1. Convergence of neural network model without
forward shortcut(regular net)
Residual Network Convergence

DNN Architecture
For Structured Data

Category Segment
City Prosperity
World-Class Health
Uptown Elite
Penthouse Chic
Metro High-Flyers
Prestige Positions
Premium Fortunes
Diamond Days
Alpha Families
Bank of Mum and Dad
Empty-Nest Adventure
Multi-level Hierarchical Classification

C1 C2 C3
S1 S3 S4 S5 S6S2

C1 C2 C3
S1 S3 S4 S5 S6S2
Naive
Approach

DNN For Multi-Level Hierarchical Classification

We are not batching matrix algebra operations
NMC Serving operates on 1 request at a time!!
GPU vs CPU

CPU Computational Improvements

Inference on a Layer ~ Matrix Multiplication
Input
Hidden

Sparse Matrix Multiplication
inference
improvement,
sub-millisecond
model
evaluation
32x

WEAK CONNECTIONS
most connections in deep neural network are
very weak and can be removed
TRIMMING
LOW ACCURACY IMPACT
the trimming has very little impact on the
accuracy
COMPRESSED DATA
the trimming models can be described by
sparse matrices, and thus the data in models
are highly compressed

Neural Network Without Trimming

Model Model File
Size (MB)
Trimming
Threshold
Accuracy Scoring
Time (ms)
Not trimmed 108 0.0 13.29 10.0
Trimmed 2.7 0.001 13.30 0.22
Trimming: Space, Time & Performance
inference improvement, in CPU time and storage
50x

Key Takeaways
Architecture:
● Residual Networks saved the day
● Leverage expressive power of DNN for your data
Inference:
● You might not need a GPU for Deep Learning
● Improvements can be made on Sparse Matrix Algebra
libraries
● Use trimming

Deep learning at nmc devin jones

More Related Content

What's hot

Similar to Deep learning at nmc devin jones

More from Ido Shilon

Recently uploaded

Deep learning at nmc devin jones