SlideShare a Scribd company logo
Practical Solutions to Exploration Problems
Sam Daulton
Core Data Science, Facebook
Adaptive Experimentation Practical Solutions to Exploration Problems 1 / 68
Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 2 / 68
Adaptive Experimentation Team
• Horizontal R&D team within
Facebook
• Goal: radically change the way
people run experiments and
develop systems:
• Reduce threshold for
experimentation
• Use RL to robustly solve
explore/exploit problems
• Develop tools to improve and
automate decision-making
under multiple and/or
constrained objectives
Adaptive Experimentation Practical Solutions to Exploration Problems 3 / 68
Adaptive Experimentation Team
Adaptive Experimentation Practical Solutions to Exploration Problems 4 / 68
Spectrum of Automation
Adaptive Experimentation Practical Solutions to Exploration Problems 5 / 68
Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 6 / 68
Heterogeneous Connections and Devices
Adaptive Experimentation Practical Solutions to Exploration Problems 7 / 68
Homogeneous Status Quo Policy
Adaptive Experimentation Practical Solutions to Exploration Problems 8 / 68
Homogeneous Status Quo Policy
Idea: What if we loaded different numbers of stories depending on the
connection type?
Adaptive Experimentation Practical Solutions to Exploration Problems 8 / 68
Potential Contextualized Policy
Idea: What if we loaded more posts for better connections types?
Adaptive Experimentation Practical Solutions to Exploration Problems 9 / 68
Potential Contextualized Policy - Opposite
Idea: What if we loaded fewer posts for better connections types?
Adaptive Experimentation Practical Solutions to Exploration Problems 10 / 68
Potential Contextualized Policies
Suppose that for each connection type c:
• We could fetch any number of posts xc ∈ [2, 24]
• Then there are 224 = 234, 256 possible configurations to test!
Adaptive Experimentation Practical Solutions to Exploration Problems 11 / 68
Policies as Black-box Functions
The average treatment effect over all individuals can be expected to be
some smooth function of the policy table x = [x1, ..., xk]:
f(x) : Rk
→ R
Adaptive Experimentation Practical Solutions to Exploration Problems 12 / 68
Black-box Function View of RL
• Turns ”full RL” problem into an infinite-armed bandit problem
πx∗ = arg max
x
g(f(x))
• Advantages:
• Does not require estimating value functions, state transition functions,
or inference about unobserved states
• Involves virtually no logging of actions, states, or intermediate rewards
• Allows for direct maximization of multiple, delayed rewards
Question: How can we make predictions about long-term outcomes from
limited number of vector-valued policies?
Adaptive Experimentation Practical Solutions to Exploration Problems 13 / 68
Gaussian Process (GP) Priors
Adaptive Experimentation Practical Solutions to Exploration Problems 14 / 68
Gaussian Process (GP) Priors
Adaptive Experimentation Practical Solutions to Exploration Problems 15 / 68
Gaussian Process (GP) Posteriors
Adaptive Experimentation Practical Solutions to Exploration Problems 16 / 68
Gaussian Process (GP) Posteriors
Adaptive Experimentation Practical Solutions to Exploration Problems 17 / 68
Gaussian Process (GP) Posteriors
GP regression gives well-calibrated posterior predictive intervals that are
easy to compute
Adaptive Experimentation Practical Solutions to Exploration Problems 18 / 68
Gaussian Process (GP) Regression
In practice, we find that GP surrogate models fit the data well for many
online experiments.
Adaptive Experimentation Practical Solutions to Exploration Problems 19 / 68
Other Examples with Continuous Action Spaces
• Value models governing ranking policies: e.g.
rank(Z) = x1P(click|Z) + x2Zx3
num friends + f(P(spam|Z)/x4) + ...
• Bit-rate controllers for video and audio streaming
• Data retrieval policies for ML backends
Question: How do we use GP surrogate models to guide the
explore-exploit trade-off?
Adaptive Experimentation Practical Solutions to Exploration Problems 20 / 68
Bayesian Optimization
Setup
Adaptive Experimentation Practical Solutions to Exploration Problems 21 / 68
Bayesian Optimization
Setup
Adaptive Experimentation Practical Solutions to Exploration Problems 22 / 68
Bayesian Optimization
Round 1
Adaptive Experimentation Practical Solutions to Exploration Problems 23 / 68
Bayesian Optimization
Round 2
Adaptive Experimentation Practical Solutions to Exploration Problems 24 / 68
Bayesian Optimization
Round 3
Adaptive Experimentation Practical Solutions to Exploration Problems 25 / 68
Bayesian Optimization
q-Batch Bayesian Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 26 / 68
Bayesian Optimization
Response surface is maximized sequentially
• Models tell us which regions should be considered for further
assessment
Adaptive Experimentation Practical Solutions to Exploration Problems 27 / 68
Bayesian Optimization
Algorithm 1 BayesianOptimization
1: Run N random initial arms
2: for t = 0 to T do
3: Fit GP model to data
4: Use acquistion function select candidates C
5: Evaluate C on black box function
6: Add new observations to dataset
7: end for
Adaptive Experimentation Practical Solutions to Exploration Problems 28 / 68
Alternatives
Grid Search (Expensive - 81 arms)
Adaptive Experimentation Practical Solutions to Exploration Problems 29 / 68
Alternatives
Random Search (Cheaper - 25 arms)
• Maxima can be deduced with only a few, smartly chosen arms
Adaptive Experimentation Practical Solutions to Exploration Problems 30 / 68
Competing Objectives
• Product teams are used to running an A/B test and observing the
outcomes.
• Often, there are multiple competing objectives
Adaptive Experimentation Practical Solutions to Exploration Problems 31 / 68
Competing Objectives
If we want full automation, we need to specify more information in
advance: ideally, ”the” scalarized objective
Adaptive Experimentation Practical Solutions to Exploration Problems 32 / 68
Competing Objectives
Decision Makers Have Multiple Objectives
Adaptive Experimentation Practical Solutions to Exploration Problems 33 / 68
Competing Objectives
Decision makers don’t like scalarizations: e.g.
objective = −0.8 · cpu + 1.1 · time spent
Adaptive Experimentation Practical Solutions to Exploration Problems 34 / 68
Competing Objectives
Decision makers prefer constraints:
min(cpu) subject to time spent > 0.7
Adaptive Experimentation Practical Solutions to Exploration Problems 35 / 68
Practical Challenges
• Constrained optimization
• Observations often have high variance, leading to potentially large
measurement error
• High noise levels can degrade the performance of many common
acquisition functions including Expected Improvement
Adaptive Experimentation Practical Solutions to Exploration Problems 36 / 68
Solution
For more details, see
• Constrained Bayesian Optimization with Noisy Experiments Bayesian
Analysis 2019. Letham, Karrer, Ottoni, & Bakshy
Adaptive Experimentation Practical Solutions to Exploration Problems 37 / 68
Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 38 / 68
Value Model Tuning
• Ranking teams use value models, combine multiple predictive models
and features, e.g.
rank(Z) = x1P(click|Z) + x2Zx3
num friends + f(P(spam|Z)/x4) + ...
• Not feasible to run sufficiently powered experiments with 20+ arms,
so the team developed a simulator
Adaptive Experimentation Practical Solutions to Exploration Problems 39 / 68
Simulation Setup
Adaptive Experimentation Practical Solutions to Exploration Problems 40 / 68
Biased Simulator
Adaptive Experimentation Practical Solutions to Exploration Problems 41 / 68
Debiasing Simulations with Multi-Task Models
Adaptive Experimentation Practical Solutions to Exploration Problems 42 / 68
Debiasing Simulations with Multi-Task Models
Adaptive Experimentation Practical Solutions to Exploration Problems 43 / 68
Multi-Task Bayesian Optimization Loop
Algorithm 2 MultiTaskBayesianOptimization
1: Run N random arms online
2: Run M random arms offline with M > N
3: for t = 0 to T do
4: Fit MT-GP model to all data, with each batch as separate task
5: Use NEI to generate q candidates C (e.g. q = 30)
6: Run C on the simulator, fit GP model again
7: Use NEI to generate candidates to run online
8: end for
Adaptive Experimentation Practical Solutions to Exploration Problems 44 / 68
Example of Multi-Task Bayesian Optimization
0 5 10 15 20 25 30 35 40
Iteration
−1
0
1
2
Outcome
Objective
−2
−1
0
1
Outcome
Constraints
−2
−1
0
1
2
Outcome
0 5 10 15 20 25 30 35 40
Iteration
−2
0
2
Outcome
Adaptive Experimentation Practical Solutions to Exploration Problems 45 / 68
Paper
For more details, see
• See Bayesian Optimization for Policy Search via Online-Offline
Experimentation. Letham & Bakshy 2019. Forthcoming, arXiv
1904.01049
Adaptive Experimentation Practical Solutions to Exploration Problems 46 / 68
Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 47 / 68
Open Source Tools
Adaptive Experimentation Practical Solutions to Exploration Problems 48 / 68
Research to Production
Adaptive Experimentation Practical Solutions to Exploration Problems 49 / 68
Simple APIs
Adaptive Experimentation Practical Solutions to Exploration Problems 50 / 68
Adaptive Experimentation in Practice
Adaptive Experimentation Practical Solutions to Exploration Problems 51 / 68
Experiment Understanding
Adaptive Experimentation Practical Solutions to Exploration Problems 52 / 68
BoTorch
Adaptive Experimentation Practical Solutions to Exploration Problems 53 / 68
BoTorch: Building Blocks
Adaptive Experimentation Practical Solutions to Exploration Problems 54 / 68
Improving Researcher Efficiency
Adaptive Experimentation Practical Solutions to Exploration Problems 55 / 68
Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 56 / 68
Video Upload Transcoding Optimization
Problem
• System receives requests to upload videos of different source qualities
and file sizes from a variety of network connections and devices.
• To ensure high reliability, a video may be transcoded to be uploaded
at a lower quality
• For each video upload request, we have features about
• the video: file size, duration, source resolution
• the network: country, network type, download speed
• the device
Goal
• Maximize quality preserved without decreasing reliability
Adaptive Experimentation Practical Solutions to Exploration Problems 57 / 68
Video Upload Transcoding - CB Problem
• Context: features about video, network, device
• Actions: 360p, 480p, 720p, 1080p
• Outcomes: reliability y(x, a)
• Rewards: ?? some function R(x, a, y)
Adaptive Experimentation Practical Solutions to Exploration Problems 58 / 68
Approach - Bandit Algorithmm
Thompson Sampling
• Works well in batch mode
• Hyper-parameter free exploration
• Always ”picks the best” codec: picks codecs with probability
proportional to it being the best
Adaptive Experimentation Practical Solutions to Exploration Problems 59 / 68
Approach - Modeling
Bayesian Linear Model
• Bernoulli likelihood to predict reliability
• Using a neural network feature extractor
• Simple two-layer MLP (50, 4) trained via SGD
• Last layer is a stochastic variational GP with a linear kernel
• Trained via stochastic variational inference using 1000 inducing points
according to space-filling design
Adaptive Experimentation Practical Solutions to Exploration Problems 60 / 68
Thompson Sampling
Algorithm 3 ThompsonSampling
Input: discrete set of actions A, distribution over models P0(f)
1: for t = 0 to T do
2: Sample model ˜ft ∼ Pt(f|X, y)
3: Select an action at ← arg maxa∈A E(rt|xt, a, ˜ft)
4: Observe reward rt
5: Update distribution Pt+1(f)
6: end for
Adaptive Experimentation Practical Solutions to Exploration Problems 61 / 68
Issues with Vanilla Thompson Sampling
• Thompson sampling does not account for the constraint
• Change in reliability must be non-negative
• Unclear how to optimally specify reward parameterization
Adaptive Experimentation Practical Solutions to Exploration Problems 62 / 68
Constrained Thompson Sampling
Algorithm 4 ConstrainedThompsonSampling
1: Input: discrete set of actions A, distribution over models P0(f)
2: for t = 0 to T do
3: Receive context xt
4: Sample model ˜ft ∼ Pt(f|X, y)
5: for a ∈ A do
6: Estimate outcomes ˜ft(xt, a)
7: end for
8: Fetch action under baseline policy b ← πb(xt)
9: Filter feasible actions: Afeas ← {a ∈ A| ˜ft(xt, a) ≥ ε · ˜ft(xt, b)}
10: Select an action at ← arg maxa∈Afeas
E(rt|xt, a, ˜ft)
11: Observe outcome yt
12: Update distribution Pt+1(f)
13: end for
Adaptive Experimentation Practical Solutions to Exploration Problems 63 / 68
Reward Shaping Setup
Reward Shaping:
• Reward is 0 if the upload is a failure
• Reward is fixed at 1 for a 360p upload success:
• Reward is monotonically increasing with quality:
R(y = 1, a) = 1 +
a ≤a
wa
where
wi ∈ (0.0, 0.2]
Safety Constraint: ε ∈ [0.95, 1.0]
Adaptive Experimentation Practical Solutions to Exploration Problems 64 / 68
Reward Shaping Optimization
• Teams care about top-line outcomes:
• Reliability: mean reliability per user
• Quality preserved: mean quality (e.g., 1080p preserved, HD) per user
• Other outcomes: watch time, content production
• Difficult to evaluate these outcomes from purely offline data
Solution: Use Bayesian Optimization (via Ax) using online experiments
Adaptive Experimentation Practical Solutions to Exploration Problems 65 / 68
Reward Shaping Optimization
(a) 1080p quality preserved (b) Reliability
Figure: GP-modeled response surface of mean percent change in video quality
and reliability relative to the baseline policy. Each point represents a policy
parameterized by reward function hyperparameters and constraint parameter ε.
Adaptive Experimentation Practical Solutions to Exploration Problems 66 / 68
Reward Shaping Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 67 / 68
Thanks
Adaptive Experimentation Team
• Manager: Eytan Bakshy
• Constrained TS: Sam Daulton, Shaun Singh, Drew Dimmery
• BoTorch: Max Balandat, Sam Daulton, Daniel Jiang, Brian Karrer,
Ben Letham
• Ax: Kostya Kashin, Lili Dworkin, Lena Kashtelyan, Ben Letham,
Ashwin Murthy, Shaun Singh, and Drew Dimmery
Papers
• Constrained Bayesian Optimization with Noisy Experiments. Letham
et al. 2019, Bayesian Analysis.
• Bayesian Optimization for Policy Search via Online-Offline
Experimentation. Letham & Bakshy 2019. Forthcoming, arXiv
1904.01049
Adaptive Experimentation Practical Solutions to Exploration Problems 68 / 68

More Related Content

What's hot (20)

Marketplace in motion - AdKDD keynote - 2020
Marketplace in motion - AdKDD keynote - 2020 Marketplace in motion - AdKDD keynote - 2020
Marketplace in motion - AdKDD keynote - 2020
Roelof van Zwol
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
Yves Raimond
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Justin Basilico
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Förderverein Technische Fakultät
 
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talk
Faisal Siddiqi
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
Sudeep Das, Ph.D.
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Girish Khanzode
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Sudeep Das, Ph.D.
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Xavier Amatriain
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
Justin Basilico
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 
Netflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 StarsNetflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 Stars
Xavier Amatriain
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
Grace T. Huang
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
Parmeshwar Khurd
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
Pradeep Redddy Raamana
 
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Justin Basilico
 
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...
Databricks
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
HJ van Veen
 
Marketplace in motion - AdKDD keynote - 2020
Marketplace in motion - AdKDD keynote - 2020 Marketplace in motion - AdKDD keynote - 2020
Marketplace in motion - AdKDD keynote - 2020
Roelof van Zwol
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
Yves Raimond
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Justin Basilico
 
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talk
Faisal Siddiqi
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
Sudeep Das, Ph.D.
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Sudeep Das, Ph.D.
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Xavier Amatriain
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
Justin Basilico
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 
Netflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 StarsNetflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 Stars
Xavier Amatriain
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
Grace T. Huang
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
Parmeshwar Khurd
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
Pradeep Redddy Raamana
 
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Justin Basilico
 
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...
Databricks
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
HJ van Veen
 

Similar to Facebook Talk at Netflix ML Platform meetup Sep 2019 (20)

GAUSSIAN PRESENTATION.ppt
GAUSSIAN PRESENTATION.pptGAUSSIAN PRESENTATION.ppt
GAUSSIAN PRESENTATION.ppt
sudhavathsavi
 
GAUSSIAN PRESENTATION (1).ppt
GAUSSIAN PRESENTATION (1).pptGAUSSIAN PRESENTATION (1).ppt
GAUSSIAN PRESENTATION (1).ppt
sudhavathsavi
 
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
The Statistical and Applied Mathematical Sciences Institute
 
2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx
ZhiwuGuo1
 
Reinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsReinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual Bandits
Max Pagels
 
Bayesian Optimization for Balancing Metrics in Recommender Systems
Bayesian Optimization for Balancing Metrics in Recommender SystemsBayesian Optimization for Balancing Metrics in Recommender Systems
Bayesian Optimization for Balancing Metrics in Recommender Systems
Viral Gupta
 
Ijcai 2020
Ijcai 2020Ijcai 2020
Ijcai 2020
Viral Gupta
 
Meta-learning of exploration-exploitation strategies in reinforcement learning
Meta-learning of exploration-exploitation strategies in reinforcement learningMeta-learning of exploration-exploitation strategies in reinforcement learning
Meta-learning of exploration-exploitation strategies in reinforcement learning
Université de Liège (ULg)
 
Learning for exploration-exploitation in reinforcement learning. The dusk of ...
Learning for exploration-exploitation in reinforcement learning. The dusk of ...Learning for exploration-exploitation in reinforcement learning. The dusk of ...
Learning for exploration-exploitation in reinforcement learning. The dusk of ...
Université de Liège (ULg)
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
GeeksLab Odessa
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
MLconf
 
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
SigOpt
 
Probabilistic machine learning for optimization and solving complex
Probabilistic machine learning for optimization and solving complexProbabilistic machine learning for optimization and solving complex
Probabilistic machine learning for optimization and solving complex
Data Science Leuven
 
Sagemaker Automatic model tuning
Sagemaker Automatic model tuningSagemaker Automatic model tuning
Sagemaker Automatic model tuning
Soji Adeshina
 
Horizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at Scale
Databricks
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
Yelp Engineering
 
Matt gershoff
Matt gershoffMatt gershoff
Matt gershoff
Rising Media, Inc.
 
Big Data Challenges and Solutions
Big Data Challenges and SolutionsBig Data Challenges and Solutions
Big Data Challenges and Solutions
New York City College of Technology Computer Systems Technology Colloquium
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Pooyan Jamshidi
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
taeseon ryu
 
GAUSSIAN PRESENTATION.ppt
GAUSSIAN PRESENTATION.pptGAUSSIAN PRESENTATION.ppt
GAUSSIAN PRESENTATION.ppt
sudhavathsavi
 
GAUSSIAN PRESENTATION (1).ppt
GAUSSIAN PRESENTATION (1).pptGAUSSIAN PRESENTATION (1).ppt
GAUSSIAN PRESENTATION (1).ppt
sudhavathsavi
 
2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx
ZhiwuGuo1
 
Reinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsReinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual Bandits
Max Pagels
 
Bayesian Optimization for Balancing Metrics in Recommender Systems
Bayesian Optimization for Balancing Metrics in Recommender SystemsBayesian Optimization for Balancing Metrics in Recommender Systems
Bayesian Optimization for Balancing Metrics in Recommender Systems
Viral Gupta
 
Meta-learning of exploration-exploitation strategies in reinforcement learning
Meta-learning of exploration-exploitation strategies in reinforcement learningMeta-learning of exploration-exploitation strategies in reinforcement learning
Meta-learning of exploration-exploitation strategies in reinforcement learning
Université de Liège (ULg)
 
Learning for exploration-exploitation in reinforcement learning. The dusk of ...
Learning for exploration-exploitation in reinforcement learning. The dusk of ...Learning for exploration-exploitation in reinforcement learning. The dusk of ...
Learning for exploration-exploitation in reinforcement learning. The dusk of ...
Université de Liège (ULg)
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
GeeksLab Odessa
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
MLconf
 
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
SigOpt
 
Probabilistic machine learning for optimization and solving complex
Probabilistic machine learning for optimization and solving complexProbabilistic machine learning for optimization and solving complex
Probabilistic machine learning for optimization and solving complex
Data Science Leuven
 
Sagemaker Automatic model tuning
Sagemaker Automatic model tuningSagemaker Automatic model tuning
Sagemaker Automatic model tuning
Soji Adeshina
 
Horizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at Scale
Databricks
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
Yelp Engineering
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Pooyan Jamshidi
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
taeseon ryu
 
Ad

Recently uploaded (20)

GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
James Anderson
 
Droidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing HealthcareDroidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing Healthcare
Droidal LLC
 
STKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 versionSTKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 version
Dr. Jimmy Schwarzkopf
 
Dev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API WorkflowsDev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API Workflows
UiPathCommunity
 
Create Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent BuilderCreate Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent Builder
DianaGray10
 
Gihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai TechnologyGihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai Technology
zainkhurram1111
 
Dr Jimmy Schwarzkopf presentation on the SUMMIT 2025 A
Dr Jimmy Schwarzkopf presentation on the SUMMIT 2025 ADr Jimmy Schwarzkopf presentation on the SUMMIT 2025 A
Dr Jimmy Schwarzkopf presentation on the SUMMIT 2025 A
Dr. Jimmy Schwarzkopf
 
Supercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMsSupercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMs
Francesco Corti
 
Cyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptxCyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptx
Ghimire B.R.
 
6th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 20256th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 2025
DanBrown980551
 
Securiport - A Border Security Company
Securiport  -  A Border Security CompanySecuriport  -  A Border Security Company
Securiport - A Border Security Company
Securiport
 
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
Jasper Oosterveld
 
Co-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using ProvenanceCo-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
AI Trends - Mary Meeker
AI Trends - Mary MeekerAI Trends - Mary Meeker
AI Trends - Mary Meeker
Razin Mustafiz
 
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Aaryan Kansari
 
Microsoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentationMicrosoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentation
Digitalmara
 
LSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection FunctionLSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection Function
Takahiro Harada
 
Fortinet Certified Associate in Cybersecurity
Fortinet Certified Associate in CybersecurityFortinet Certified Associate in Cybersecurity
Fortinet Certified Associate in Cybersecurity
VICTOR MAESTRE RAMIREZ
 
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptxECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
Jasper Oosterveld
 
New Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDBNew Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
James Anderson
 
Droidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing HealthcareDroidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing Healthcare
Droidal LLC
 
STKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 versionSTKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 version
Dr. Jimmy Schwarzkopf
 
Dev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API WorkflowsDev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API Workflows
UiPathCommunity
 
Create Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent BuilderCreate Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent Builder
DianaGray10
 
Gihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai TechnologyGihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai Technology
zainkhurram1111
 
Dr Jimmy Schwarzkopf presentation on the SUMMIT 2025 A
Dr Jimmy Schwarzkopf presentation on the SUMMIT 2025 ADr Jimmy Schwarzkopf presentation on the SUMMIT 2025 A
Dr Jimmy Schwarzkopf presentation on the SUMMIT 2025 A
Dr. Jimmy Schwarzkopf
 
Supercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMsSupercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMs
Francesco Corti
 
Cyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptxCyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptx
Ghimire B.R.
 
6th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 20256th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 2025
DanBrown980551
 
Securiport - A Border Security Company
Securiport  -  A Border Security CompanySecuriport  -  A Border Security Company
Securiport - A Border Security Company
Securiport
 
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
Jasper Oosterveld
 
Co-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using ProvenanceCo-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
AI Trends - Mary Meeker
AI Trends - Mary MeekerAI Trends - Mary Meeker
AI Trends - Mary Meeker
Razin Mustafiz
 
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Aaryan Kansari
 
Microsoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentationMicrosoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentation
Digitalmara
 
LSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection FunctionLSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection Function
Takahiro Harada
 
Fortinet Certified Associate in Cybersecurity
Fortinet Certified Associate in CybersecurityFortinet Certified Associate in Cybersecurity
Fortinet Certified Associate in Cybersecurity
VICTOR MAESTRE RAMIREZ
 
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptxECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
Jasper Oosterveld
 
New Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDBNew Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
Ad

Facebook Talk at Netflix ML Platform meetup Sep 2019

  • 1. Practical Solutions to Exploration Problems Sam Daulton Core Data Science, Facebook Adaptive Experimentation Practical Solutions to Exploration Problems 1 / 68
  • 2. Overview 1 Adaptive Experimentation Introduction 2 Direct policy search via Bayesian optimization Motivating Example Gaussian Process Regression Bayesian Optimization 3 Combining online and offline experiments Value Model Tuning Multi-Task Bayesian Optimization 4 Open Source Tools Ax BoTorch 5 Constrained Bayesian Contextual Bandits Video Upload Transcoding Optimization Constrained Thompson Sampling (CTS) Reward Shaping and Hyperparameter Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 2 / 68
  • 3. Adaptive Experimentation Team • Horizontal R&D team within Facebook • Goal: radically change the way people run experiments and develop systems: • Reduce threshold for experimentation • Use RL to robustly solve explore/exploit problems • Develop tools to improve and automate decision-making under multiple and/or constrained objectives Adaptive Experimentation Practical Solutions to Exploration Problems 3 / 68
  • 4. Adaptive Experimentation Team Adaptive Experimentation Practical Solutions to Exploration Problems 4 / 68
  • 5. Spectrum of Automation Adaptive Experimentation Practical Solutions to Exploration Problems 5 / 68
  • 6. Overview 1 Adaptive Experimentation Introduction 2 Direct policy search via Bayesian optimization Motivating Example Gaussian Process Regression Bayesian Optimization 3 Combining online and offline experiments Value Model Tuning Multi-Task Bayesian Optimization 4 Open Source Tools Ax BoTorch 5 Constrained Bayesian Contextual Bandits Video Upload Transcoding Optimization Constrained Thompson Sampling (CTS) Reward Shaping and Hyperparameter Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 6 / 68
  • 7. Heterogeneous Connections and Devices Adaptive Experimentation Practical Solutions to Exploration Problems 7 / 68
  • 8. Homogeneous Status Quo Policy Adaptive Experimentation Practical Solutions to Exploration Problems 8 / 68
  • 9. Homogeneous Status Quo Policy Idea: What if we loaded different numbers of stories depending on the connection type? Adaptive Experimentation Practical Solutions to Exploration Problems 8 / 68
  • 10. Potential Contextualized Policy Idea: What if we loaded more posts for better connections types? Adaptive Experimentation Practical Solutions to Exploration Problems 9 / 68
  • 11. Potential Contextualized Policy - Opposite Idea: What if we loaded fewer posts for better connections types? Adaptive Experimentation Practical Solutions to Exploration Problems 10 / 68
  • 12. Potential Contextualized Policies Suppose that for each connection type c: • We could fetch any number of posts xc ∈ [2, 24] • Then there are 224 = 234, 256 possible configurations to test! Adaptive Experimentation Practical Solutions to Exploration Problems 11 / 68
  • 13. Policies as Black-box Functions The average treatment effect over all individuals can be expected to be some smooth function of the policy table x = [x1, ..., xk]: f(x) : Rk → R Adaptive Experimentation Practical Solutions to Exploration Problems 12 / 68
  • 14. Black-box Function View of RL • Turns ”full RL” problem into an infinite-armed bandit problem πx∗ = arg max x g(f(x)) • Advantages: • Does not require estimating value functions, state transition functions, or inference about unobserved states • Involves virtually no logging of actions, states, or intermediate rewards • Allows for direct maximization of multiple, delayed rewards Question: How can we make predictions about long-term outcomes from limited number of vector-valued policies? Adaptive Experimentation Practical Solutions to Exploration Problems 13 / 68
  • 15. Gaussian Process (GP) Priors Adaptive Experimentation Practical Solutions to Exploration Problems 14 / 68
  • 16. Gaussian Process (GP) Priors Adaptive Experimentation Practical Solutions to Exploration Problems 15 / 68
  • 17. Gaussian Process (GP) Posteriors Adaptive Experimentation Practical Solutions to Exploration Problems 16 / 68
  • 18. Gaussian Process (GP) Posteriors Adaptive Experimentation Practical Solutions to Exploration Problems 17 / 68
  • 19. Gaussian Process (GP) Posteriors GP regression gives well-calibrated posterior predictive intervals that are easy to compute Adaptive Experimentation Practical Solutions to Exploration Problems 18 / 68
  • 20. Gaussian Process (GP) Regression In practice, we find that GP surrogate models fit the data well for many online experiments. Adaptive Experimentation Practical Solutions to Exploration Problems 19 / 68
  • 21. Other Examples with Continuous Action Spaces • Value models governing ranking policies: e.g. rank(Z) = x1P(click|Z) + x2Zx3 num friends + f(P(spam|Z)/x4) + ... • Bit-rate controllers for video and audio streaming • Data retrieval policies for ML backends Question: How do we use GP surrogate models to guide the explore-exploit trade-off? Adaptive Experimentation Practical Solutions to Exploration Problems 20 / 68
  • 22. Bayesian Optimization Setup Adaptive Experimentation Practical Solutions to Exploration Problems 21 / 68
  • 23. Bayesian Optimization Setup Adaptive Experimentation Practical Solutions to Exploration Problems 22 / 68
  • 24. Bayesian Optimization Round 1 Adaptive Experimentation Practical Solutions to Exploration Problems 23 / 68
  • 25. Bayesian Optimization Round 2 Adaptive Experimentation Practical Solutions to Exploration Problems 24 / 68
  • 26. Bayesian Optimization Round 3 Adaptive Experimentation Practical Solutions to Exploration Problems 25 / 68
  • 27. Bayesian Optimization q-Batch Bayesian Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 26 / 68
  • 28. Bayesian Optimization Response surface is maximized sequentially • Models tell us which regions should be considered for further assessment Adaptive Experimentation Practical Solutions to Exploration Problems 27 / 68
  • 29. Bayesian Optimization Algorithm 1 BayesianOptimization 1: Run N random initial arms 2: for t = 0 to T do 3: Fit GP model to data 4: Use acquistion function select candidates C 5: Evaluate C on black box function 6: Add new observations to dataset 7: end for Adaptive Experimentation Practical Solutions to Exploration Problems 28 / 68
  • 30. Alternatives Grid Search (Expensive - 81 arms) Adaptive Experimentation Practical Solutions to Exploration Problems 29 / 68
  • 31. Alternatives Random Search (Cheaper - 25 arms) • Maxima can be deduced with only a few, smartly chosen arms Adaptive Experimentation Practical Solutions to Exploration Problems 30 / 68
  • 32. Competing Objectives • Product teams are used to running an A/B test and observing the outcomes. • Often, there are multiple competing objectives Adaptive Experimentation Practical Solutions to Exploration Problems 31 / 68
  • 33. Competing Objectives If we want full automation, we need to specify more information in advance: ideally, ”the” scalarized objective Adaptive Experimentation Practical Solutions to Exploration Problems 32 / 68
  • 34. Competing Objectives Decision Makers Have Multiple Objectives Adaptive Experimentation Practical Solutions to Exploration Problems 33 / 68
  • 35. Competing Objectives Decision makers don’t like scalarizations: e.g. objective = −0.8 · cpu + 1.1 · time spent Adaptive Experimentation Practical Solutions to Exploration Problems 34 / 68
  • 36. Competing Objectives Decision makers prefer constraints: min(cpu) subject to time spent > 0.7 Adaptive Experimentation Practical Solutions to Exploration Problems 35 / 68
  • 37. Practical Challenges • Constrained optimization • Observations often have high variance, leading to potentially large measurement error • High noise levels can degrade the performance of many common acquisition functions including Expected Improvement Adaptive Experimentation Practical Solutions to Exploration Problems 36 / 68
  • 38. Solution For more details, see • Constrained Bayesian Optimization with Noisy Experiments Bayesian Analysis 2019. Letham, Karrer, Ottoni, & Bakshy Adaptive Experimentation Practical Solutions to Exploration Problems 37 / 68
  • 39. Overview 1 Adaptive Experimentation Introduction 2 Direct policy search via Bayesian optimization Motivating Example Gaussian Process Regression Bayesian Optimization 3 Combining online and offline experiments Value Model Tuning Multi-Task Bayesian Optimization 4 Open Source Tools Ax BoTorch 5 Constrained Bayesian Contextual Bandits Video Upload Transcoding Optimization Constrained Thompson Sampling (CTS) Reward Shaping and Hyperparameter Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 38 / 68
  • 40. Value Model Tuning • Ranking teams use value models, combine multiple predictive models and features, e.g. rank(Z) = x1P(click|Z) + x2Zx3 num friends + f(P(spam|Z)/x4) + ... • Not feasible to run sufficiently powered experiments with 20+ arms, so the team developed a simulator Adaptive Experimentation Practical Solutions to Exploration Problems 39 / 68
  • 41. Simulation Setup Adaptive Experimentation Practical Solutions to Exploration Problems 40 / 68
  • 42. Biased Simulator Adaptive Experimentation Practical Solutions to Exploration Problems 41 / 68
  • 43. Debiasing Simulations with Multi-Task Models Adaptive Experimentation Practical Solutions to Exploration Problems 42 / 68
  • 44. Debiasing Simulations with Multi-Task Models Adaptive Experimentation Practical Solutions to Exploration Problems 43 / 68
  • 45. Multi-Task Bayesian Optimization Loop Algorithm 2 MultiTaskBayesianOptimization 1: Run N random arms online 2: Run M random arms offline with M > N 3: for t = 0 to T do 4: Fit MT-GP model to all data, with each batch as separate task 5: Use NEI to generate q candidates C (e.g. q = 30) 6: Run C on the simulator, fit GP model again 7: Use NEI to generate candidates to run online 8: end for Adaptive Experimentation Practical Solutions to Exploration Problems 44 / 68
  • 46. Example of Multi-Task Bayesian Optimization 0 5 10 15 20 25 30 35 40 Iteration −1 0 1 2 Outcome Objective −2 −1 0 1 Outcome Constraints −2 −1 0 1 2 Outcome 0 5 10 15 20 25 30 35 40 Iteration −2 0 2 Outcome Adaptive Experimentation Practical Solutions to Exploration Problems 45 / 68
  • 47. Paper For more details, see • See Bayesian Optimization for Policy Search via Online-Offline Experimentation. Letham & Bakshy 2019. Forthcoming, arXiv 1904.01049 Adaptive Experimentation Practical Solutions to Exploration Problems 46 / 68
  • 48. Overview 1 Adaptive Experimentation Introduction 2 Direct policy search via Bayesian optimization Motivating Example Gaussian Process Regression Bayesian Optimization 3 Combining online and offline experiments Value Model Tuning Multi-Task Bayesian Optimization 4 Open Source Tools Ax BoTorch 5 Constrained Bayesian Contextual Bandits Video Upload Transcoding Optimization Constrained Thompson Sampling (CTS) Reward Shaping and Hyperparameter Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 47 / 68
  • 49. Open Source Tools Adaptive Experimentation Practical Solutions to Exploration Problems 48 / 68
  • 50. Research to Production Adaptive Experimentation Practical Solutions to Exploration Problems 49 / 68
  • 51. Simple APIs Adaptive Experimentation Practical Solutions to Exploration Problems 50 / 68
  • 52. Adaptive Experimentation in Practice Adaptive Experimentation Practical Solutions to Exploration Problems 51 / 68
  • 53. Experiment Understanding Adaptive Experimentation Practical Solutions to Exploration Problems 52 / 68
  • 54. BoTorch Adaptive Experimentation Practical Solutions to Exploration Problems 53 / 68
  • 55. BoTorch: Building Blocks Adaptive Experimentation Practical Solutions to Exploration Problems 54 / 68
  • 56. Improving Researcher Efficiency Adaptive Experimentation Practical Solutions to Exploration Problems 55 / 68
  • 57. Overview 1 Adaptive Experimentation Introduction 2 Direct policy search via Bayesian optimization Motivating Example Gaussian Process Regression Bayesian Optimization 3 Combining online and offline experiments Value Model Tuning Multi-Task Bayesian Optimization 4 Open Source Tools Ax BoTorch 5 Constrained Bayesian Contextual Bandits Video Upload Transcoding Optimization Constrained Thompson Sampling (CTS) Reward Shaping and Hyperparameter Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 56 / 68
  • 58. Video Upload Transcoding Optimization Problem • System receives requests to upload videos of different source qualities and file sizes from a variety of network connections and devices. • To ensure high reliability, a video may be transcoded to be uploaded at a lower quality • For each video upload request, we have features about • the video: file size, duration, source resolution • the network: country, network type, download speed • the device Goal • Maximize quality preserved without decreasing reliability Adaptive Experimentation Practical Solutions to Exploration Problems 57 / 68
  • 59. Video Upload Transcoding - CB Problem • Context: features about video, network, device • Actions: 360p, 480p, 720p, 1080p • Outcomes: reliability y(x, a) • Rewards: ?? some function R(x, a, y) Adaptive Experimentation Practical Solutions to Exploration Problems 58 / 68
  • 60. Approach - Bandit Algorithmm Thompson Sampling • Works well in batch mode • Hyper-parameter free exploration • Always ”picks the best” codec: picks codecs with probability proportional to it being the best Adaptive Experimentation Practical Solutions to Exploration Problems 59 / 68
  • 61. Approach - Modeling Bayesian Linear Model • Bernoulli likelihood to predict reliability • Using a neural network feature extractor • Simple two-layer MLP (50, 4) trained via SGD • Last layer is a stochastic variational GP with a linear kernel • Trained via stochastic variational inference using 1000 inducing points according to space-filling design Adaptive Experimentation Practical Solutions to Exploration Problems 60 / 68
  • 62. Thompson Sampling Algorithm 3 ThompsonSampling Input: discrete set of actions A, distribution over models P0(f) 1: for t = 0 to T do 2: Sample model ˜ft ∼ Pt(f|X, y) 3: Select an action at ← arg maxa∈A E(rt|xt, a, ˜ft) 4: Observe reward rt 5: Update distribution Pt+1(f) 6: end for Adaptive Experimentation Practical Solutions to Exploration Problems 61 / 68
  • 63. Issues with Vanilla Thompson Sampling • Thompson sampling does not account for the constraint • Change in reliability must be non-negative • Unclear how to optimally specify reward parameterization Adaptive Experimentation Practical Solutions to Exploration Problems 62 / 68
  • 64. Constrained Thompson Sampling Algorithm 4 ConstrainedThompsonSampling 1: Input: discrete set of actions A, distribution over models P0(f) 2: for t = 0 to T do 3: Receive context xt 4: Sample model ˜ft ∼ Pt(f|X, y) 5: for a ∈ A do 6: Estimate outcomes ˜ft(xt, a) 7: end for 8: Fetch action under baseline policy b ← πb(xt) 9: Filter feasible actions: Afeas ← {a ∈ A| ˜ft(xt, a) ≥ ε · ˜ft(xt, b)} 10: Select an action at ← arg maxa∈Afeas E(rt|xt, a, ˜ft) 11: Observe outcome yt 12: Update distribution Pt+1(f) 13: end for Adaptive Experimentation Practical Solutions to Exploration Problems 63 / 68
  • 65. Reward Shaping Setup Reward Shaping: • Reward is 0 if the upload is a failure • Reward is fixed at 1 for a 360p upload success: • Reward is monotonically increasing with quality: R(y = 1, a) = 1 + a ≤a wa where wi ∈ (0.0, 0.2] Safety Constraint: ε ∈ [0.95, 1.0] Adaptive Experimentation Practical Solutions to Exploration Problems 64 / 68
  • 66. Reward Shaping Optimization • Teams care about top-line outcomes: • Reliability: mean reliability per user • Quality preserved: mean quality (e.g., 1080p preserved, HD) per user • Other outcomes: watch time, content production • Difficult to evaluate these outcomes from purely offline data Solution: Use Bayesian Optimization (via Ax) using online experiments Adaptive Experimentation Practical Solutions to Exploration Problems 65 / 68
  • 67. Reward Shaping Optimization (a) 1080p quality preserved (b) Reliability Figure: GP-modeled response surface of mean percent change in video quality and reliability relative to the baseline policy. Each point represents a policy parameterized by reward function hyperparameters and constraint parameter ε. Adaptive Experimentation Practical Solutions to Exploration Problems 66 / 68
  • 68. Reward Shaping Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 67 / 68
  • 69. Thanks Adaptive Experimentation Team • Manager: Eytan Bakshy • Constrained TS: Sam Daulton, Shaun Singh, Drew Dimmery • BoTorch: Max Balandat, Sam Daulton, Daniel Jiang, Brian Karrer, Ben Letham • Ax: Kostya Kashin, Lili Dworkin, Lena Kashtelyan, Ben Letham, Ashwin Murthy, Shaun Singh, and Drew Dimmery Papers • Constrained Bayesian Optimization with Noisy Experiments. Letham et al. 2019, Bayesian Analysis. • Bayesian Optimization for Policy Search via Online-Offline Experimentation. Letham & Bakshy 2019. Forthcoming, arXiv 1904.01049 Adaptive Experimentation Practical Solutions to Exploration Problems 68 / 68