Netflix talk at ML Platform meetup Sep 2019

Bandit Infrastructure
for Personalized
Recommendations
Elliot Chow & Fernando Amat Gil
ML Platform Meetup
September 12th 2019

Recommendations at Netflix
User is going to abandon
session in...

Recommendations at Netflix
● Personalized Homepage for each member
● Goal: quickly help members find content they love
● Challenge:
○ 150M+ members in 190 countries
○ New content added daily
● Recommendations Valued at: $1B*
*Carlos A. Gomez-Uribe, Neil Hunt: The Netflix Recommender System: Algorithms, Business Value, and Innovation. ACM Trans. Management Inf. Syst. 6(4): 13:1-13:19 (2016)

Meaning
● What is this title?
● Why should I watch it?

Evidence example I: web homepage

Evidence example II
Billboard
iOS

Evidence personalization
● Multiple choices for each (show, evidence) pair
● For each title, hard to predict what resonates before launch
● Taste can change over time

Evidence personalization (cont’d)
● Asset choreography (slate): image, video, text, cast, etc
Bandit

Bandit setup
● Generic framework to obtain unbiased training data
● Very vast literature (beyond scope)
● Every session is a mini A/B test
Pred 1-Pred
reward?

Bandit setup for evidence
● Goal: for each mini A/B test collect a sample with
○ Context x
○ Selected action (or treatment) a
○ Propensity p
○ Set of possible actions s
○ Reward r
● Once you have this, very similar to other ML approaches
● Obtaining this data is the hard part infrastructure-wise
(Elliot’s section)

System requirements
● Scalable: at Netflix’s global size
● Generic: common policy (or model) and offline metrics API
● Flexible:
○ Each recommendation problems has different
attribution, reward and context definition
○ Easy to add new canvases to the bandit system (image,
video, text, cards, etc)

Policy API
● Class Slate (List[Items] items)
○ Vector of ids to recreate a composition
(“row_10”, “img_124”, “img_1037”,...,
“synopsis_32”, “card_101”)
1
2 3 4 5
7 6
8
9

Policy API
● Slate select(List[List[Item]] items)
○ Given a list of possible slates, return one of them
(explore-exploit trade-off)
● Map[Slate, Double] propensity(List[List[Item]
items)
○ Return propensity of each possible slate to debias data

Offline policy evaluation
● Unbiased offline evaluation from logged samples
● Inverse Propensity Scoring (IPS), Doubly Robust, etc
E[Reward] = 2 / 3
User 1 User 2 User 3 User 4 User 5 User 6
Logged stochastic
treatment
Play?
(binary
reward)
New policy
assignment

Offline policy evaluation system
● Input:
○ Logged sample (x, a, s, r, p)
○ Trained policy
● Output:
○ Metric (IPS, SNIPS, DM, DR, RECAP, etc)
● Observations:
○ Trivially parallelizable (decomposes per sample)
○ Each sample needs to consider all possible candidates

Actions taken by a Netflix member
- Log-in
- Search for “stra”
- Play Stranger Things
...
Data:
Member Activity

Describes Netflix’s desire to take a specific action
- Recommend Stranger Things
- Display Image
...
Data:
Intent-to-Treat (ITT)

The actual experience as seen by the Netflix member
- Showed Stranger Things on Home Page
- Displayed Image
...
Data:
Treatment

Log In Play Stranger
Things
Post-PlayIntent-to-treatHome
Page
13 Reasons Why
on Popular on
Netflix
Stranger
Things
on My List
Treatment
Member Actions

“Closing the
Loop”:
Join Intent-to-
Treat with
Treatment
- Did the intent-to-treat
take effect as desired?
- Which policy was
used?
- What were the
propensities?
- What features were
used?

“Closing the
Loop”:
Attribution /
Reward
Computation
- Define reward based
on short-/long-term
member activity
- Compute in “near”
real-time

“Closing the
Loop”:
Update and
Publish Policy
- Incorporate incoming
observations
- Evaluate offline replay
metrics
- Push updated policy

Standardized
Logging
Standardize logging to simplify on-
boarding
- Common logging format
- Centralized landing
topics/tables

Real-time
Processing with
Flink
Process member activity,
treatment, and intent-to-treat
events in real-time
- Join intent-to-treat and
treatment events
- Prepare data in format
amenable to reward
computation
- Kafka and Hive outputs

Spark Client For
Attribution and
Reward
Computation
Provide flexible processing of the
joined member activity, treatment,
and intent-to-treat data
- Events sorted by time
- Unbounded windowing
support
- Optimizations for common
access patterns

Configurable,
Common
Attribution
Templatize common attribution
methods for simpler access to data
- Select treatment T events
- Attribute member actions A to
treatments
- Within time window W
- Match treatment and member
action based on key K
- Materialize as Hive tables

Microservice Architecture
Edge
Home Page
Service
Image
Selection
Service
...
...
...
... ...

Edge
Home Page
Service
...
Image
Selection
Service
...
...
...
ID1=ITT1:
Image-123,
Features...
ID1
ID1
...

Edge
Home Page
Service
...
Image
Selection
Service
...
...
...
ID1,
Image-123
ID1→ITT1:
Image-123,
Features...
ID1
ID1
...
ID1, Image-
123
ID1, Image-
123
ID1, Image-
123

Edge
Home Page
Service
...
Image
Selection
Service
...
...
...
Cache
ITT1:
Image-123,
Features..
...

Edge
Home Page
Service
...
Image
Selection
Service
...
...
...
Cache
ITT1:
Image-123,
Features..
ID2
...
ID2,
Image-123
ID2, Image-
123
ID2, Image-
123

Edge
Home Page
Service
...
Image
Selection
Service
...
...
...
Cache
ITT1:
Image-123,
Features...
ID2,
Image-123
ID2
ID2 →
ITT1
...
ID2, Image-
123
ID2, Image-
123

Stream Processing
- Out-of-order data
- Maintaining state for joins
- Widely-varying traffic patterns

Scale
- 150M+ members
- 190 countries
- 450B+ unique events per day
- Many microservices
- Many personalization
algorithms

Used in Production as well
as multiple A/B tests

Netflix talk at ML Platform meetup Sep 2019

More Related Content

What's hot (20)

Similar to Netflix talk at ML Platform meetup Sep 2019 (20)

Recently uploaded (20)

Netflix talk at ML Platform meetup Sep 2019

Editor's Notes