In this talk at the Netflix Machine Learning Platform Meetup on 12 Sep 2019, Sam Daulton from Facebook discusses "Practical Solutions to real-world exploration problems".
LinkedIn talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
In this talk at the Netflix Machine Learning Platform Meetup on 12 Sep 2019, Kinjal Basu from LinkedIn discussed Online Parameter Selection for web-based Ranking vis Bayesian Optimization
Netflix talk at ML Platform meetup Sep 2019Faisal Siddiqi
In this talk at the Netflix Machine Learning Platform Meetup on 12 Sep 2019, Fernando Amat and Elliot Chow from Netflix talk about the Bandit infrastructure for Personalized Recommendations
The document describes Dropbox's machine learning infrastructure and platform. It discusses how the platform provides scalable access to Dropbox's large data sources for offline and online ML use cases. The platform aims to accelerate ML development at Dropbox by standardizing workflows, automating processes, and making ML deployment and experimentation easy. It utilizes various services like Antenna for activity data and dbxlearn for distributed training across Dropbox and AWS resources. The platform supports all stages of the ML lifecycle from data preparation to model deployment and monitoring.
This document proposes a calibrated recommendations approach that aims to provide recommendations that reflect all of a user's interests in correct proportions. Standard recommender systems trained for accuracy can lead to unbalanced recommendations that amplify a user's main interests and crowd out lesser interests. The calibrated recommendations approach uses a post-processing re-ranking step to optimize a submodular calibration metric, balancing accuracy and fairness by recommending items from all a user's interests in their correct proportions. Experiments on MovieLens data show that calibration can be improved significantly without degrading accuracy much.
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
This document discusses the importance of time and causality in recommender systems. It summarizes that (1) time and causality are critical aspects that must be considered in data collection, experiment design, algorithms, and system design. (2) Recommender systems operate within a feedback loop where the recommendations influence future user behavior and data, so effects like reinforcement of biases can occur. (3) Both offline and online experimentation are needed to properly evaluate systems and generalization over time.
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
In this talk, we present a general multi-armed bandit framework for recommendations on the Netflix homepage. We present two example case studies using MABs at Netflix - a) Artwork Personalization to recommend personalized visuals for each of our members for the different titles and b) Billboard recommendation to recommend the right title to be watched on the Billboard.
Marketplace in motion - AdKDD keynote - 2020 Roelof van Zwol
This document discusses Pinterest's ads marketplace and optimization strategies. It provides an overview of Pinterest's ads delivery funnel including ranking, auction, and retrieval. It then discusses predicting relevance and engagement through human labels, deep learning models, and multi-task learning. It also covers auction design principles and candidate retrieval using a two-tower deep learning approach. The goal is to maximize long-term value for users, advertisers, and Pinterest across different surfaces and ad formats.
Presentation at the Netflix Expo session at RecSys 2020 virtual conference on 2020-09-24. It provides an overview of recommendation and personalization at Netflix and then highlights some of the things we’ve been working on as well as some important open research questions in the field of recommendations.
Recommendation systems today are widely used across many applications such as in multimedia content platforms, social networks, and ecommerce, to provide suggestions to users that are most likely to fulfill their needs, thereby improving the user experience. Academic research, to date, largely focuses on the performance of recommendation models in terms of ranking quality or accuracy measures, which often don’t directly translate into improvements in the real-world. In this talk, we present some of the most interesting challenges that we face in the personalization efforts at Netflix. The goal of this talk is to sunshine challenging research problems in industrial recommendation systems and start a conversation about exciting areas of future research.
ML Infra for Netflix Recommendations - AI NEXTCon talkFaisal Siddiqi
Faisal Siddiqi presented on machine learning infrastructure for recommendations. He outlined Boson and AlgoCommons, two major ML infra components. Boson focuses on offline training for both ad-hoc exploration and production. It provides utilities for data preparation, feature engineering, training, metrics, and visualization. AlgoCommons provides common abstractions and building blocks for ML like data access, feature encoders, predictors, and metrics. It aims for composability, portability, and avoiding training-serving skew.
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
Sudeep Das presented on recommender systems and advances in deep learning approaches. Matrix factorization is still the foundational method for collaborative filtering, but deep learning models are now augmenting these approaches. Deep neural networks can learn hierarchical representations of users and items from raw data like images, text, and sequences of user actions. Models like wide and deep networks combine the strengths of memorization and generalization. Sequence models like recurrent neural networks have also been applied to sessions for next item recommendation.
This document provides an overview of recommender systems for e-commerce. It discusses various recommender approaches including collaborative filtering algorithms like nearest neighbor methods, item-based collaborative filtering, and matrix factorization. It also covers content-based recommendation, classification techniques, addressing challenges like data sparsity and scalability, and hybrid recommendation approaches.
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...Sudeep Das, Ph.D.
In this talk, we will provide an overview of Deep Learning methods applied to personalization and search at Netflix. We will set the stage by describing the unique challenges faced at Netflix in the areas of recommendations and information retrieval. Then we will delve into how we leverage a blend of traditional algorithms and emergent deep learning methods and new types of embeddings, especially hyperbolic space embeddings, to address these challenges.
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
This document discusses making Netflix machine learning algorithms reliable. It describes how Netflix uses machine learning for tasks like personalized ranking and recommendation. The goals are to maximize member satisfaction and retention. The models and algorithms used include regression, matrix factorization, neural networks, and bandits. The key aspects of making the models reliable discussed are: automated retraining of models, testing training pipelines, checking models and inputs online for anomalies, responding gracefully to failures, and training models to be resilient to different conditions and failures.
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
The document summarizes a presentation on recommender systems given by Xavier Amatriain. It begins with introductions to recommender systems and collaborative filtering. Traditional collaborative filtering approaches include user-based and item-based methods. User-based CF finds similar users to a target user and recommends items they liked. Item-based CF finds similar items to those a target user liked and predicts ratings. Both approaches address sparsity and scalability challenges with dimensionality reduction techniques.
Personalized Page Generation for Browsing RecommendationsJustin Basilico
Talk from First Workshop on Recommendation Systems for TV and Online Video at RecSys 2014 in Foster City, CA on 2014-10-10 about how we personalize the layout of the Netflix homepage to make it easier for people to browse the recommendations to quickly find something to watch and enjoy.
This document summarizes an presentation about personalizing artwork selection on Netflix using multi-armed bandit algorithms. Bandit algorithms were applied to choose representative, informative and engaging artwork for each title to maximize member satisfaction and retention. Contextual bandits were used to personalize artwork selection based on member preferences and context. Netflix deployed a system that precomputes personalized artwork using bandit models and caches the results to serve images quickly at scale. The system was able to lift engagement metrics based on A/B tests of the personalized artwork selection models.
This document summarizes a presentation given by Xavier Amatriain from Netflix on their recommendation system and personalization techniques. Netflix uses a variety of machine learning models like SVD, RBMs, and linear regression to make personalized recommendations. They also personalize other aspects of the user experience like rankings, genres, and similar item suggestions. Netflix collects massive amounts of user data from ratings, searches, and streaming to train these models. The goal is to provide high quality recommendations that are accurate, novel, diverse, and increase user engagement.
Data council SF 2020 Building a Personalized Messaging System at NetflixGrace T. Huang
This document discusses building a personalized messaging system at Netflix to recommend content to users. It covers four key considerations:
1) Personalizing messaging decisions using classification techniques like logistic regression on outcome features.
2) Removing bias from the system using techniques like Thompson sampling, exploration-exploitation, and propensity correction.
3) Maximizing causal impact by explicitly modeling past actions and comparing member satisfaction with and without messages.
4) Balancing reward against cost by imposing a volume constraint like an incrementality threshold and using reinforcement learning approaches.
The document summarizes techniques for handling missing values in recommender models. It discusses how gradient boosted decision trees (GBDTs) and neural networks (NNs) can deal with missing features during training without imputing values. For GBDTs, XGBoost and R's GBM handle missing values differently, with XGBoost sending examples left or right and GBM using a ternary split. NNs can handle missing features via techniques like dropout, imputing averages, or including a "missing" embedding value. The document concludes that the optimal approach depends on the dataset.
The document discusses cross-validation, which is used to estimate how well a machine learning model will generalize to unseen data. It defines cross-validation as splitting a dataset into training and test sets to train a model on the training set and evaluate it on the held-out test set. Common types of cross-validation discussed are k-fold cross-validation, which repeats the process by splitting the data into k folds, and repeated holdout validation, which randomly samples subsets for training and testing over multiple repetitions.
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Justin Basilico
Talk from the REVEAL workshop at RecSys 2019 on 2019-09-20 in Copenhagen, Denmark. The slides were primarily made by Ajinkya More and the paper was also joint work with Linas Baltrunas and Nikos Vlassis.
The paper is available here: https://drive.google.com/open?id=1oaM5Fu2bJ0GzMC09yyqjA7eZD9axzSKb
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...Databricks
As a data driven company, we use Machine Learning algos and A/B tests to drive all of the content recommendations for our members. To improve the quality of our personalized recommendations, we try an idea offline using historical data. Ideas that improve our offline metrics are then pushed as A/B tests which are measured through statistically significant improvements in core metrics such as member engagement, satisfaction, and retention.The heart of such offline analyses are historical facts data that are used to generate features required by the machine learning model. For example, viewing history of a member, videos in mylist etc.
Building a fact store at an ever evolving Netflix scale is non trivial. Ensuring we capture enough fact data to cover all stratification needs of various experiments and guarantee that the data we serve is temporally accurate is an important requirement. In this talk, we will present the key requirements, evolution of our fact store design, its implementation, the scale and our learnings.
We will also take a deep dive into fact vs feature logging, design tradeoffs, infrastructure performance, reliability and query API for the store. We use Spark and Scala extensively and variety of compression techniques to store/retrieve data efficiently.
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
Talk from Software Engineering for Machine Learning Workshop (SW4ML) at the Neural Information Processing Systems (NIPS) 2014 conference in Montreal, Canada on 2014-12-13.
Abstract:
Building a real system that incorporates machine learning as a part can be a difficult effort, both in terms of the algorithmic and engineering challenges involved. In this talk I will focus on the engineering side and discuss some of the practical issues we’ve encountered in developing real machine learning systems at Netflix and some of the lessons we’ve learned over time. I will describe our approach for building machine learning systems and how it comes from a desire to balance many different, and sometimes conflicting, requirements such as handling large volumes of data, choosing and adapting good algorithms, keeping recommendations fresh and accurate, remaining responsive to user actions, and also being flexible to accommodate research and experimentation. I will focus on what it takes to put machine learning into a real system that works in a feedback loop with our users and how that imposes different requirements and a different focus than doing machine learning only within a lab environment. I will address the particular software engineering challenges that we’ve faced in running our algorithms at scale in the cloud. I will also mention some simple design patterns that we’ve fond to be useful across a wide variety of machine-learned systems.
This document discusses Gaussian process bandit optimization, which is a method for adaptively sampling an unknown function to maximize its value. It proposes using an upper confidence bound (UCB) approach, where samples are selected to maximize an upper bound on the function value while also exploring uncertain regions. The key points are:
1) It proves regret bounds for UCB in this setting that depend on how quickly information can be gained about the function from samples, known as the maximal information gain.
2) This connects Gaussian process bandit optimization to Bayesian experimental design, which aims to maximize information gain.
3) Experiments on temperature and traffic data show the UCB approach performs comparably to existing heuristics while providing the
This document discusses Gaussian process bandit optimization, which is a framework for adaptively sampling an unknown function to maximize information gain. It proposes using an Upper Confidence Bound (UCB) approach, where samples are selected to maximize an upper bound on the function value while balancing exploration and exploitation. The key results establish that the regret of this approach depends on the rate at which information can be gained about the function, which captures its "learnability." Experimental results on temperature and traffic data demonstrate the UCB approach performs comparably to existing heuristics.
Marketplace in motion - AdKDD keynote - 2020 Roelof van Zwol
This document discusses Pinterest's ads marketplace and optimization strategies. It provides an overview of Pinterest's ads delivery funnel including ranking, auction, and retrieval. It then discusses predicting relevance and engagement through human labels, deep learning models, and multi-task learning. It also covers auction design principles and candidate retrieval using a two-tower deep learning approach. The goal is to maximize long-term value for users, advertisers, and Pinterest across different surfaces and ad formats.
Presentation at the Netflix Expo session at RecSys 2020 virtual conference on 2020-09-24. It provides an overview of recommendation and personalization at Netflix and then highlights some of the things we’ve been working on as well as some important open research questions in the field of recommendations.
Recommendation systems today are widely used across many applications such as in multimedia content platforms, social networks, and ecommerce, to provide suggestions to users that are most likely to fulfill their needs, thereby improving the user experience. Academic research, to date, largely focuses on the performance of recommendation models in terms of ranking quality or accuracy measures, which often don’t directly translate into improvements in the real-world. In this talk, we present some of the most interesting challenges that we face in the personalization efforts at Netflix. The goal of this talk is to sunshine challenging research problems in industrial recommendation systems and start a conversation about exciting areas of future research.
ML Infra for Netflix Recommendations - AI NEXTCon talkFaisal Siddiqi
Faisal Siddiqi presented on machine learning infrastructure for recommendations. He outlined Boson and AlgoCommons, two major ML infra components. Boson focuses on offline training for both ad-hoc exploration and production. It provides utilities for data preparation, feature engineering, training, metrics, and visualization. AlgoCommons provides common abstractions and building blocks for ML like data access, feature encoders, predictors, and metrics. It aims for composability, portability, and avoiding training-serving skew.
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
Sudeep Das presented on recommender systems and advances in deep learning approaches. Matrix factorization is still the foundational method for collaborative filtering, but deep learning models are now augmenting these approaches. Deep neural networks can learn hierarchical representations of users and items from raw data like images, text, and sequences of user actions. Models like wide and deep networks combine the strengths of memorization and generalization. Sequence models like recurrent neural networks have also been applied to sessions for next item recommendation.
This document provides an overview of recommender systems for e-commerce. It discusses various recommender approaches including collaborative filtering algorithms like nearest neighbor methods, item-based collaborative filtering, and matrix factorization. It also covers content-based recommendation, classification techniques, addressing challenges like data sparsity and scalability, and hybrid recommendation approaches.
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...Sudeep Das, Ph.D.
In this talk, we will provide an overview of Deep Learning methods applied to personalization and search at Netflix. We will set the stage by describing the unique challenges faced at Netflix in the areas of recommendations and information retrieval. Then we will delve into how we leverage a blend of traditional algorithms and emergent deep learning methods and new types of embeddings, especially hyperbolic space embeddings, to address these challenges.
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
This document discusses making Netflix machine learning algorithms reliable. It describes how Netflix uses machine learning for tasks like personalized ranking and recommendation. The goals are to maximize member satisfaction and retention. The models and algorithms used include regression, matrix factorization, neural networks, and bandits. The key aspects of making the models reliable discussed are: automated retraining of models, testing training pipelines, checking models and inputs online for anomalies, responding gracefully to failures, and training models to be resilient to different conditions and failures.
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
The document summarizes a presentation on recommender systems given by Xavier Amatriain. It begins with introductions to recommender systems and collaborative filtering. Traditional collaborative filtering approaches include user-based and item-based methods. User-based CF finds similar users to a target user and recommends items they liked. Item-based CF finds similar items to those a target user liked and predicts ratings. Both approaches address sparsity and scalability challenges with dimensionality reduction techniques.
Personalized Page Generation for Browsing RecommendationsJustin Basilico
Talk from First Workshop on Recommendation Systems for TV and Online Video at RecSys 2014 in Foster City, CA on 2014-10-10 about how we personalize the layout of the Netflix homepage to make it easier for people to browse the recommendations to quickly find something to watch and enjoy.
This document summarizes an presentation about personalizing artwork selection on Netflix using multi-armed bandit algorithms. Bandit algorithms were applied to choose representative, informative and engaging artwork for each title to maximize member satisfaction and retention. Contextual bandits were used to personalize artwork selection based on member preferences and context. Netflix deployed a system that precomputes personalized artwork using bandit models and caches the results to serve images quickly at scale. The system was able to lift engagement metrics based on A/B tests of the personalized artwork selection models.
This document summarizes a presentation given by Xavier Amatriain from Netflix on their recommendation system and personalization techniques. Netflix uses a variety of machine learning models like SVD, RBMs, and linear regression to make personalized recommendations. They also personalize other aspects of the user experience like rankings, genres, and similar item suggestions. Netflix collects massive amounts of user data from ratings, searches, and streaming to train these models. The goal is to provide high quality recommendations that are accurate, novel, diverse, and increase user engagement.
Data council SF 2020 Building a Personalized Messaging System at NetflixGrace T. Huang
This document discusses building a personalized messaging system at Netflix to recommend content to users. It covers four key considerations:
1) Personalizing messaging decisions using classification techniques like logistic regression on outcome features.
2) Removing bias from the system using techniques like Thompson sampling, exploration-exploitation, and propensity correction.
3) Maximizing causal impact by explicitly modeling past actions and comparing member satisfaction with and without messages.
4) Balancing reward against cost by imposing a volume constraint like an incrementality threshold and using reinforcement learning approaches.
The document summarizes techniques for handling missing values in recommender models. It discusses how gradient boosted decision trees (GBDTs) and neural networks (NNs) can deal with missing features during training without imputing values. For GBDTs, XGBoost and R's GBM handle missing values differently, with XGBoost sending examples left or right and GBM using a ternary split. NNs can handle missing features via techniques like dropout, imputing averages, or including a "missing" embedding value. The document concludes that the optimal approach depends on the dataset.
The document discusses cross-validation, which is used to estimate how well a machine learning model will generalize to unseen data. It defines cross-validation as splitting a dataset into training and test sets to train a model on the training set and evaluate it on the held-out test set. Common types of cross-validation discussed are k-fold cross-validation, which repeats the process by splitting the data into k folds, and repeated holdout validation, which randomly samples subsets for training and testing over multiple repetitions.
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Justin Basilico
Talk from the REVEAL workshop at RecSys 2019 on 2019-09-20 in Copenhagen, Denmark. The slides were primarily made by Ajinkya More and the paper was also joint work with Linas Baltrunas and Nikos Vlassis.
The paper is available here: https://drive.google.com/open?id=1oaM5Fu2bJ0GzMC09yyqjA7eZD9axzSKb
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...Databricks
As a data driven company, we use Machine Learning algos and A/B tests to drive all of the content recommendations for our members. To improve the quality of our personalized recommendations, we try an idea offline using historical data. Ideas that improve our offline metrics are then pushed as A/B tests which are measured through statistically significant improvements in core metrics such as member engagement, satisfaction, and retention.The heart of such offline analyses are historical facts data that are used to generate features required by the machine learning model. For example, viewing history of a member, videos in mylist etc.
Building a fact store at an ever evolving Netflix scale is non trivial. Ensuring we capture enough fact data to cover all stratification needs of various experiments and guarantee that the data we serve is temporally accurate is an important requirement. In this talk, we will present the key requirements, evolution of our fact store design, its implementation, the scale and our learnings.
We will also take a deep dive into fact vs feature logging, design tradeoffs, infrastructure performance, reliability and query API for the store. We use Spark and Scala extensively and variety of compression techniques to store/retrieve data efficiently.
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
Talk from Software Engineering for Machine Learning Workshop (SW4ML) at the Neural Information Processing Systems (NIPS) 2014 conference in Montreal, Canada on 2014-12-13.
Abstract:
Building a real system that incorporates machine learning as a part can be a difficult effort, both in terms of the algorithmic and engineering challenges involved. In this talk I will focus on the engineering side and discuss some of the practical issues we’ve encountered in developing real machine learning systems at Netflix and some of the lessons we’ve learned over time. I will describe our approach for building machine learning systems and how it comes from a desire to balance many different, and sometimes conflicting, requirements such as handling large volumes of data, choosing and adapting good algorithms, keeping recommendations fresh and accurate, remaining responsive to user actions, and also being flexible to accommodate research and experimentation. I will focus on what it takes to put machine learning into a real system that works in a feedback loop with our users and how that imposes different requirements and a different focus than doing machine learning only within a lab environment. I will address the particular software engineering challenges that we’ve faced in running our algorithms at scale in the cloud. I will also mention some simple design patterns that we’ve fond to be useful across a wide variety of machine-learned systems.
This document discusses Gaussian process bandit optimization, which is a method for adaptively sampling an unknown function to maximize its value. It proposes using an upper confidence bound (UCB) approach, where samples are selected to maximize an upper bound on the function value while also exploring uncertain regions. The key points are:
1) It proves regret bounds for UCB in this setting that depend on how quickly information can be gained about the function from samples, known as the maximal information gain.
2) This connects Gaussian process bandit optimization to Bayesian experimental design, which aims to maximize information gain.
3) Experiments on temperature and traffic data show the UCB approach performs comparably to existing heuristics while providing the
This document discusses Gaussian process bandit optimization, which is a framework for adaptively sampling an unknown function to maximize information gain. It proposes using an Upper Confidence Bound (UCB) approach, where samples are selected to maximize an upper bound on the function value while balancing exploration and exploitation. The key results establish that the regret of this approach depends on the rate at which information can be gained about the function, which captures its "learnability." Experimental results on temperature and traffic data demonstrate the UCB approach performs comparably to existing heuristics.
Simulators play a major role in analyzing multi-modal transportation networks. As their complexity increases, optimization becomes an increasingly challenging task. Current calibration procedures often rely on heuristics, rules of thumb and sometimes on brute-force search. Alternatively, we provide a statistical method which combines a distributed, Gaussian Process Bayesian optimization method with dimensionality reduction techniques and structural improvement. We then demonstrate our framework on the problem of calibrating a multi-modal transportation network of city of Bloomington, Illinois. Our framework is sample efficient and supported by theoretical analysis and an empirical study. We demonstrate on the problem of calibrating a multi-modal transportation network of city of Bloomington, Illinois. Finally, we discuss directions for further research.
This document discusses reinforcement learning and the multi-armed bandit problem. It introduces concepts like the epsilon-greedy algorithm, which balances exploiting the best known action and exploring potentially better actions. It also discusses updating action values using temporal difference learning and the exploration strategy of using upper confidence bounds. Policy-based algorithms are introduced, which assign preference values to actions and select probabilistically using the softmax function. The preferences can then be updated using policy gradient ascent based on observed rewards.
Bayesian Optimization for Balancing Metrics in Recommender SystemsViral Gupta
Most large-scale online recommender systems like newsfeed ranking, people recommendations, job recommendations, etc. often have multiple utilities or metrics that need to be simultaneously optimized. The machine learning models that are trained to optimize a single utility are combined together through parameters to generate the final ranking function. These combination parameters drive business metrics. Finding the right choice of the parameters is often done through online A/B experimentation, which can be incredibly complex and time-consuming, especially considering the non-linear effects of these parameters on the metrics of interest.
In this tutorial, we will talk about how we can apply Bayesian Optimization techniques to obtain the parameters for such complex online systems in order to balance the competing metrics. First, we will provide an in-depth introduction to Bayesian Optimization, covering some of the basics as well as the recent advances in the field. Second, we will talk about how to formulate a real-world recommender system problem as a black-box optimization problem that can be solved via Bayesian Optimization. We will focus on a few key problems such as newsfeed ranking, people recommendations, job recommendations, etc. Third, we will talk about the architecture of the solution and how we are able to deploy it for large-scale systems. Finally, we will discuss the extensions and some of the future directions in this domain.
Most large-scale online recommender systems like newsfeed ranking, people recommendations, job recommendations, etc. often have multiple utilities or metrics that need to be simultaneously optimized. The machine learning models that are trained to optimize a single utility are combined together through parameters to generate the final ranking function. These combination parameters drive business metrics. Finding the right choice of the parameters is often done through online A/B experimentation, which can be incredibly complex and time-consuming, especially considering the non-linear effects of these parameters on the metrics of interest.
In this tutorial, we will talk about how we can apply Bayesian Optimization techniques to obtain the parameters for such complex online systems in order to balance the competing metrics. First, we will provide an in-depth introduction to Bayesian Optimization, covering some of the basics as well as the recent advances in the field. Second, we will talk about how to formulate a real-world recommender system problem as a black-box optimization problem that can be solved via Bayesian Optimization. We will focus on a few key problems such as newsfeed ranking, people recommendations, job recommendations, etc. Third, we will talk about the architecture of the solution and how we are able to deploy it for large-scale systems. Finally, we will discuss the extensions and some of the future directions in this domain.
Meta-learning of exploration-exploitation strategies in reinforcement learningUniversité de Liège (ULg)
The document discusses meta-learning of exploration-exploitation strategies in reinforcement learning. It proposes learning exploration strategies rather than relying on predefined formulas. The approach involves defining training problems, candidate strategies, a performance criterion, and optimizing strategies on the training problems using an optimization tool. Simulation results show learned strategies outperform strategies based on predefined formulas when the test problems match the training problems. The document provides examples applying this approach to multi-armed bandits and tree search problems.
Learning for exploration-exploitation in reinforcement learning. The dusk of ...Université de Liège (ULg)
The document discusses reinforcement learning and the exploration-exploitation tradeoff that agents face. It proposes learning exploration-exploitation strategies rather than relying on predefined formulas. The approach defines training problems, candidate strategies parameterized by formulas, a performance criterion, and optimizes strategies on training problems using an estimation of distribution algorithm. Simulation results show learned strategies outperform strategies from common formulas on matching test problems.
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...GeeksLab Odessa
DataScienceLab, 13 мая 2017
Оптимизация гиперпараметров машинного обучения при помощи Байесовской оптимизации
Максим Бевза (Research Engineer at Grammarly)
Все алгоритмы машинного обучения нуждаются в настройке (тьюнинге). Часто мы используем Grid Search или Randomized Search или нашу интуицию для подбора гиперпараметров. Байесовская оптимизация поможет нам направить Randomized Search в те места, которые наиболее перспективны, так, чтобы тот же (или лучший) результат мы получили за меньшее количество итераций.
Все материалы: http://datascience.in.ua/report2017
Scott Clark, Software Engineer, Yelp at MLconf SFMLconf
Abstract: Introducing the Metric Optimization Engine (MOE); an open source, black box, Bayesian Global Optimization engine for optimal experimental design.
In this talk we will introduce MOE, the Metric Optimization Engine. MOE is an efficient way to optimize a system’s parameters, when evaluating parameters is time-consuming or expensive. It can be used to help tackle a myriad of problems including optimizing a system’s click-through or conversion rate via A/B testing, tuning parameters of a machine learning prediction method or expensive batch job, designing an engineering system or finding the optimal parameters of a real-world experiment.
MOE is ideal for problems in which the optimization problem’s objective function is a black box, not necessarily convex or concave, derivatives are unavailable, and we seek a global optimum, rather than just a local one. This ability to handle black-box objective functions allows us to use MOE to optimize nearly any system, without requiring any internal knowledge or access. To use MOE, we simply need to specify some objective function, some set of parameters, and any historical data we may have from previous evaluations of the objective function. MOE then finds the set of parameters that maximize (or minimize) the objective function, while evaluating the objective function as few times as possible. This is done internally using Bayesian Global Optimization on a Gaussian Process model of the underlying system and finding the points of highest Expected Improvement to sample next. MOE provides easy to use Python, C++, CUDA and REST interfaces to accomplish these goals and is fully open source. We will present the motivation and background, discuss the implementation and give real-world examples.
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...SigOpt
This document summarizes a presentation given by Michael McCourt of SigOpt at the 2nd Annual Workshop on Offline and Online Evaluation of Interactive Systems at KDD 2019. The presentation discusses using Bayesian optimization to efficiently explore the tradeoffs between competing offline metrics. It proposes posing the problem as a constrained multi-objective optimization to explore the Pareto efficient frontier. It describes allowing users to interactively update the constraints as the search progresses to account for changes in their goals. Various strategies for exploring the efficient frontier are discussed, including randomly or systematically varying the constraints. Applications of Bayesian optimization are highlighted, including for materials and model design. Future work directions are proposed, such as better handling black-box constraints.
Probabilistic machine learning for optimization and solving complexData Science Leuven
This document discusses probabilistic machine learning techniques for optimization and solving complex problems. It introduces Bayesian parametric and nonparametric models that can marginalize over weights and consider a continuous range of potential models. Gaussian processes are discussed as a way to put a prior directly on functions. Dynamic programming approaches are presented for expensive optimization problems by splitting the task, deciding next evaluation locations, and optimizing an acquisition function. Results are shown for using causality techniques to remove systematic errors in finding exoplanets from lengthy space telescope recordings with rare signals and corrupted data. The key takeaways are that probabilistic models provide confidence by knowing what they don't know, multidisciplinary teams are needed to tackle complex cases, and that theory is important to
This document discusses automatic model tuning techniques for hyperparameter optimization. It covers search-based approaches like grid search and random search, as well their limitations due to computational expense from evaluating every hyperparameter combination. Bayesian optimization techniques are proposed to overcome this by using a surrogate model and acquisition function to iteratively suggest new hyperparameter configurations to evaluate. Amazon SageMaker Automatic Model Tuning applies Bayesian optimization approaches for efficient hyperparameter tuning in the cloud.
Horizon: Deep Reinforcement Learning at ScaleDatabricks
To build a decision-making system, we must provide answers to two sets of questions: (1) ""What will happen if I make decision X?"" and (2) ""How should I pick which decision to make?"".
Typically, the first set of questions are answered with supervised learning: we build models to forecast whether someone will click on an ad, or visit a post. The second set of questions are more open-ended. In this talk, we will dive into how we can answer ""how"" questions, starting with heuristics and search. This will lead us to bandits, reinforcement learning, and Horizon: an open-source platform for training and deploying reinforcement learning models at massive scale. At Facebook, we are using Horizon, built using PyTorch 1.0 and Apache Spark, in a variety of AI-related and control tasks, spanning recommender systems, marketing & promotion distribution, and bandwidth optimization.
The talk will cover the key components of Horizon and the lessons we learned along the way that influenced the development of the platform.
Author: Jason Gauci
Abstract:
In this talk we will introduce MOE, the Metric Optimization Engine. MOE is an efficient way to optimize a system's parameters, when evaluating parameters is time-consuming or expensive. It can be used to help tackle a myriad of problems including optimizing a system's click-through or conversion rate via A/B testing, tuning parameters of a machine learning prediction method or expensive batch job, designing an engineering system or finding the optimal parameters of a real-world experiment.
MOE is ideal for problems in which the optimization problem's objective function is a black box, not necessarily convex or concave, derivatives are unavailable, and we seek a global optimum, rather than just a local one. This ability to handle black-box objective functions allows us to use MOE to optimize nearly any system, without requiring any internal knowledge or access. To use MOE, we simply need to specify some objective function, some set of parameters, and any historical data we may have from previous evaluations of the objective function. MOE then finds the set of parameters that maximize (or minimize) the objective function, while evaluating the objective function as few times as possible. This is done internally using Bayesian Global Optimization on a Gaussian Process model of the underlying system and finding the points of highest Expected Improvement to sample next. MOE provides easy to use Python, C++, CUDA and REST interfaces to accomplish these goals and is fully open source. We will present the motivation and background, discuss the implementation and give real-world examples.
Scott Clark Bio:
After finishing my PhD in Applied Mathematics at Cornell University in 2012 I have been working on the Ad Targeting team at Yelp Inc. I've been employing a variety of machine learning and optimization techniques from multi-armed bandits to Bayesian Global Optimization and beyond to their vast dataset and problems. I have also been trying to lead the charge on academic research and outreach within Yelp by leading projects like the Yelp Dataset Challenge and open sourcing MOE.
This document discusses reinforcement learning and its applications to optimization problems in marketing. It begins with definitions of reinforcement learning and multi-armed bandit problems. It then discusses how Bayesian AB testing, multi-armed bandits, and Thompson sampling can be used to solve single decision problems. The document also covers how reinforcement learning handles more complex multi-touchpoint optimization and attribution problems using techniques like Q-learning. It concludes by discussing how reinforcement learning approaches can be used for automation and predictive targeting based on user attributes.
The document discusses big data challenges and potential solutions. It begins by outlining how big data is generated from various sources and used in applications like search engines. The main challenges are determining which subset of big data to analyze and how to clean noisy data. Two potential solutions discussed are:
1) Intelligent sampling to determine a representative subset of data to analyze instead of the entire dataset, in order to improve running time. Adaptive sampling techniques like IDASA are proposed.
2) Filtering techniques like ensemble filtering use multiple models to identify and remove mislabeled instances from training data, in order to improve predictive accuracy by cleaning the data. Bayesian analysis can interpret filtering as a form of model averaging.
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Pooyan Jamshidi
Modern software systems are now being built to be used in dynamic environments utilizing configuration capabilities to adapt to changes and external uncertainties. In a self-adaptation context, we are often interested in reasoning about the performance of the systems under different configurations. Usually, we learn a black-box model based on real measurements to predict the performance of the system given a specific configuration. However, as modern systems become more complex, there are many configuration parameters that may interact and, therefore, we end up learning an exponentially large configuration space. Naturally, this does not scale when relying on real measurements in the actual changing environment. We propose a different solution: Instead of taking the measurements from the real system, we learn the model using samples from other sources, such as simulators that approximate performance of the real system at low cost.
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
Train중 예상 Return 을 최대화하기 위해 알려지지 않은 환경에서 '탐색'과 '활용' 사이의 균형을 잘 이루는 것이 중요합니다. 이를 이상적으로 수행하는 '베이즈 최적 정책'은 환경 상태뿐만 아니라 에이전트가 환경에 대해 느끼는 불확실성에 따라 행동을 결정합니다. 하지만, 베이즈 최적 정책을 계산하는 것은 작은 작업들에 대해서조차 까다롭습니다. 이 논문에서는, 알려지지 않은 환경에서 근사적으로 추론을 수행하고, 그 불확실성을 행동 선택 과정에 직접 포함시키는 방법, 'variational Bayes-Adaptive Deep RL' (variBAD)를 소개합니다.
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...James Anderson
The Quantum Apocalypse: A Looming Threat & The Need for Post-Quantum Encryption
We explore the imminent risks posed by quantum computing to modern encryption standards and the urgent need for post-quantum cryptography (PQC).
Bio: With 30 years in cybersecurity, including as a CISO, Tommy is a strategic leader driving security transformation, risk management, and program maturity. He has led high-performing teams, shaped industry policies, and advised organizations on complex cyber, compliance, and data protection challenges.
Droidal: AI Agents Revolutionizing HealthcareDroidal LLC
Droidal’s AI Agents are transforming healthcare by bringing intelligence, speed, and efficiency to key areas such as Revenue Cycle Management (RCM), clinical operations, and patient engagement. Built specifically for the needs of U.S. hospitals and clinics, Droidal's solutions are designed to improve outcomes and reduce administrative burden.
Through simple visuals and clear examples, the presentation explains how AI Agents can support medical coding, streamline claims processing, manage denials, ensure compliance, and enhance communication between providers and patients. By integrating seamlessly with existing systems, these agents act as digital coworkers that deliver faster reimbursements, reduce errors, and enable teams to focus more on patient care.
Droidal's AI technology is more than just automation — it's a shift toward intelligent healthcare operations that are scalable, secure, and cost-effective. The presentation also offers insights into future developments in AI-driven healthcare, including how continuous learning and agent autonomy will redefine daily workflows.
Whether you're a healthcare administrator, a tech leader, or a provider looking for smarter solutions, this presentation offers a compelling overview of how Droidal’s AI Agents can help your organization achieve operational excellence and better patient outcomes.
A free demo trial is available for those interested in experiencing Droidal’s AI Agents firsthand. Our team will walk you through a live demo tailored to your specific workflows, helping you understand the immediate value and long-term impact of adopting AI in your healthcare environment.
To request a free trial or learn more:
https://droidal.com/
nnual (33 years) study of the Israeli Enterprise / public IT market. Covering sections on Israeli Economy, IT trends 2026-28, several surveys (AI, CDOs, OCIO, CTO, staffing cyber, operations and infra) plus rankings of 760 vendors on 160 markets (market sizes and trends) and comparison of products according to support and market penetration.
Dev Dives: System-to-system integration with UiPath API WorkflowsUiPathCommunity
Join the next Dev Dives webinar on May 29 for a first contact with UiPath API Workflows, a powerful tool purpose-fit for API integration and data manipulation!
This session will guide you through the technical aspects of automating communication between applications, systems and data sources using API workflows.
📕 We'll delve into:
- How this feature delivers API integration as a first-party concept of the UiPath Platform.
- How to design, implement, and debug API workflows to integrate with your existing systems seamlessly and securely.
- How to optimize your API integrations with runtime built for speed and scalability.
This session is ideal for developers looking to solve API integration use cases with the power of the UiPath Platform.
👨🏫 Speakers:
Gunter De Souter, Sr. Director, Product Manager @UiPath
Ramsay Grove, Product Manager @UiPath
This session streamed live on May 29, 2025, 16:00 CET.
Check out all our upcoming UiPath Dev Dives sessions:
👉 https://community.uipath.com/dev-dives-automation-developer-2025/
Create Your First AI Agent with UiPath Agent BuilderDianaGray10
Join us for an exciting virtual event where you'll learn how to create your first AI Agent using UiPath Agent Builder. This session will cover everything you need to know about what an agent is and how easy it is to create one using the powerful AI-driven UiPath platform. You'll also discover the steps to successfully publish your AI agent. This is a wonderful opportunity for beginners and enthusiasts to gain hands-on insights and kickstart their journey in AI-powered automation.
Introduction and Background:
Study Overview and Methodology: The study analyzes the IT market in Israel, covering over 160 markets and 760 companies/products/services. It includes vendor rankings, IT budgets, and trends from 2025-2029. Vendors participate in detailed briefings and surveys.
Vendor Listings: The presentation lists numerous vendors across various pages, detailing their names and services. These vendors are ranked based on their participation and market presence.
Market Insights and Trends: Key insights include IT market forecasts, economic factors affecting IT budgets, and the impact of AI on enterprise IT. The study highlights the importance of AI integration and the concept of creative destruction.
Agentic AI and Future Predictions: Agentic AI is expected to transform human-agent collaboration, with AI systems understanding context and orchestrating complex processes. Future predictions include AI's role in shopping and enterprise IT.
Supercharge Your AI Development with Local LLMsFrancesco Corti
In today's AI development landscape, developers face significant challenges when building applications that leverage powerful large language models (LLMs) through SaaS platforms like ChatGPT, Gemini, and others. While these services offer impressive capabilities, they come with substantial costs that can quickly escalate especially during the development lifecycle. Additionally, the inherent latency of web-based APIs creates frustrating bottlenecks during the critical testing and iteration phases of development, slowing down innovation and frustrating developers.
This talk will introduce the transformative approach of integrating local LLMs directly into their development environments. By bringing these models closer to where the code lives, developers can dramatically accelerate development lifecycles while maintaining complete control over model selection and configuration. This methodology effectively reduces costs to zero by eliminating dependency on pay-per-use SaaS services, while opening new possibilities for comprehensive integration testing, rapid prototyping, and specialized use cases.
Cyber Security Legal Framework in Nepal.pptxGhimire B.R.
The presentation is about the review of existing legal framework on Cyber Security in Nepal. The strength and weakness highlights of the major acts and policies so far. Further it highlights the needs of data protection act .
6th Power Grid Model Meetup
Join the Power Grid Model community for an exciting day of sharing experiences, learning from each other, planning, and collaborating.
This hybrid in-person/online event will include a full day agenda, with the opportunity to socialize afterwards for in-person attendees.
If you have a hackathon proposal, tell us when you register!
About Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
Securiport is a border security systems provider with a progressive team approach to its task. The company acknowledges the importance of specialized skills in creating the latest in innovative security tech. The company has offices throughout the world to serve clients, and its employees speak more than twenty languages at the Washington D.C. headquarters alone.
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....Jasper Oosterveld
Sensitivity labels, powered by Microsoft Purview Information Protection, serve as the foundation for classifying and protecting your sensitive data within Microsoft 365. Their importance extends beyond classification and play a crucial role in enforcing governance policies across your Microsoft 365 environment. Join me, a Data Security Consultant and Microsoft MVP, as I share practical tips and tricks to get the full potential of sensitivity labels. I discuss sensitive information types, automatic labeling, and seamless integration with Data Loss Prevention, Teams Premium, and Microsoft 365 Copilot.
Co-Constructing Explanations for AI Systems using ProvenancePaul Groth
Explanation is not a one off - it's a process where people and systems work together to gain understanding. This idea of co-constructing explanations or explanation by exploration is powerful way to frame the problem of explanation. In this talk, I discuss our first experiments with this approach for explaining complex AI systems by using provenance. Importantly, I discuss the difficulty of evaluation and discuss some of our first approaches to evaluating these systems at scale. Finally, I touch on the importance of explanation to the comprehensive evaluation of AI systems.
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...Aaryan Kansari
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generative AI
Discover Agentic AI, the revolutionary step beyond reactive generative AI. Learn how these autonomous systems can reason, plan, execute, and adapt to achieve human-defined goals, acting as digital co-workers. Explore its promise, key frameworks like LangChain and AutoGen, and the challenges in designing reliable and safe AI agents for future workflows.
Sticky Note Bullets:
Definition: Next stage beyond ChatGPT-like systems, offering true autonomy.
Core Function: Can "reason, plan, execute and adapt" independently.
Distinction: Proactive (sets own actions for goals) vs. Reactive (responds to prompts).
Promise: Acts as "digital co-workers," handling grunt work like research, drafting, bug fixing.
Industry Outlook: Seen as a game-changer; Deloitte predicts 50% of companies using GenAI will have agentic AI pilots by 2027.
Key Frameworks: LangChain, Microsoft's AutoGen, LangGraph, CrewAI.
Development Focus: Learning to think in workflows and goals, not just model outputs.
Challenges: Ensuring reliability, safety; agents can still hallucinate or go astray.
Best Practices: Start small, iterate, add memory, keep humans in the loop for final decisions.
Use Cases: Limited only by imagination (e.g., drafting business plans, complex simulations).
Microsoft Build 2025 takeaways in one presentationDigitalmara
Microsoft Build 2025 introduced significant updates. Everything revolves around AI. DigitalMara analyzed these announcements:
• AI enhancements for Windows 11
By embedding AI capabilities directly into the OS, Microsoft is lowering the barrier for users to benefit from intelligent automation without requiring third-party tools. It's a practical step toward improving user experience, such as streamlining workflows and enhancing productivity. However, attention should be paid to data privacy, user control, and transparency of AI behavior. The implementation policy should be clear and ethical.
• GitHub Copilot coding agent
The introduction of coding agents is a meaningful step in everyday AI assistance. However, it still brings challenges. Some people compare agents with junior developers. They noted that while the agent can handle certain tasks, it often requires supervision and can introduce new issues. This innovation holds both potential and limitations. Balancing automation with human oversight is crucial to ensure quality and reliability.
• Introduction of Natural Language Web
NLWeb is a significant step toward a more natural and intuitive web experience. It can help users access content more easily and reduce reliance on traditional navigation. The open-source foundation provides developers with the flexibility to implement AI-driven interactions without rebuilding their existing platforms. NLWeb is a promising level of web interaction that complements, rather than replaces, well-designed UI.
• Introduction of Model Context Protocol
MCP provides a standardized method for connecting AI models with diverse tools and data sources. This approach simplifies the development of AI-driven applications, enhancing efficiency and scalability. Its open-source nature encourages broader adoption and collaboration within the developer community. Nevertheless, MCP can face challenges in compatibility across vendors and security in context sharing. Clear guidelines are crucial.
• Windows Subsystem for Linux is open-sourced
It's a positive step toward greater transparency and collaboration in the developer ecosystem. The community can now contribute to its evolution, helping identify issues and expand functionality faster. However, open-source software in a core system also introduces concerns around security, code quality management, and long-term maintenance. Microsoft’s continued involvement will be key to ensuring WSL remains stable and secure.
• Azure AI Foundry platform hosts Grok 3 AI models
Adding new models is a valuable expansion of AI development resources available at Azure. This provides developers with more flexibility in choosing language models that suit a range of application sizes and needs. Hosting on Azure makes access and integration easier when using Microsoft infrastructure.
Neural representations have shown the potential to accelerate ray casting in a conventional ray-tracing-based rendering pipeline. We introduce a novel approach called Locally-Subdivided Neural Intersection Function (LSNIF) that replaces bottom-level BVHs used as traditional geometric representations with a neural network. Our method introduces a sparse hash grid encoding scheme incorporating geometry voxelization, a scene-agnostic training data collection, and a tailored loss function. It enables the network to output not only visibility but also hit-point information and material indices. LSNIF can be trained offline for a single object, allowing us to use LSNIF as a replacement for its corresponding BVH. With these designs, the network can handle hit-point queries from any arbitrary viewpoint, supporting all types of rays in the rendering pipeline. We demonstrate that LSNIF can render a variety of scenes, including real-world scenes designed for other path tracers, while achieving a memory footprint reduction of up to 106.2x compared to a compressed BVH.
https://arxiv.org/abs/2504.21627
New Ways to Reduce Database Costs with ScyllaDBScyllaDB
How ScyllaDB’s latest capabilities can reduce your infrastructure costs
ScyllaDB has been obsessed with price-performance from day 1. Our core database is architected with low-level engineering optimizations that squeeze every ounce of power from the underlying infrastructure. And we just completed a multi-year effort to introduce a set of new capabilities for additional savings.
Join this webinar to learn about these new capabilities: the underlying challenges we wanted to address, the workloads that will benefit most from each, and how to get started. We’ll cover ways to:
- Avoid overprovisioning with “just-in-time” scaling
- Safely operate at up to ~90% storage utilization
- Cut network costs with new compression strategies and file-based streaming
We’ll also highlight a “hidden gem” capability that lets you safely balance multiple workloads in a single cluster. To conclude, we will share the efficiency-focused capabilities on our short-term and long-term roadmaps.
New Ways to Reduce Database Costs with ScyllaDBScyllaDB
Ad
Facebook Talk at Netflix ML Platform meetup Sep 2019
1. Practical Solutions to Exploration Problems
Sam Daulton
Core Data Science, Facebook
Adaptive Experimentation Practical Solutions to Exploration Problems 1 / 68
2. Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 2 / 68
3. Adaptive Experimentation Team
• Horizontal R&D team within
Facebook
• Goal: radically change the way
people run experiments and
develop systems:
• Reduce threshold for
experimentation
• Use RL to robustly solve
explore/exploit problems
• Develop tools to improve and
automate decision-making
under multiple and/or
constrained objectives
Adaptive Experimentation Practical Solutions to Exploration Problems 3 / 68
8. Homogeneous Status Quo Policy
Adaptive Experimentation Practical Solutions to Exploration Problems 8 / 68
9. Homogeneous Status Quo Policy
Idea: What if we loaded different numbers of stories depending on the
connection type?
Adaptive Experimentation Practical Solutions to Exploration Problems 8 / 68
10. Potential Contextualized Policy
Idea: What if we loaded more posts for better connections types?
Adaptive Experimentation Practical Solutions to Exploration Problems 9 / 68
11. Potential Contextualized Policy - Opposite
Idea: What if we loaded fewer posts for better connections types?
Adaptive Experimentation Practical Solutions to Exploration Problems 10 / 68
12. Potential Contextualized Policies
Suppose that for each connection type c:
• We could fetch any number of posts xc ∈ [2, 24]
• Then there are 224 = 234, 256 possible configurations to test!
Adaptive Experimentation Practical Solutions to Exploration Problems 11 / 68
13. Policies as Black-box Functions
The average treatment effect over all individuals can be expected to be
some smooth function of the policy table x = [x1, ..., xk]:
f(x) : Rk
→ R
Adaptive Experimentation Practical Solutions to Exploration Problems 12 / 68
14. Black-box Function View of RL
• Turns ”full RL” problem into an infinite-armed bandit problem
πx∗ = arg max
x
g(f(x))
• Advantages:
• Does not require estimating value functions, state transition functions,
or inference about unobserved states
• Involves virtually no logging of actions, states, or intermediate rewards
• Allows for direct maximization of multiple, delayed rewards
Question: How can we make predictions about long-term outcomes from
limited number of vector-valued policies?
Adaptive Experimentation Practical Solutions to Exploration Problems 13 / 68
15. Gaussian Process (GP) Priors
Adaptive Experimentation Practical Solutions to Exploration Problems 14 / 68
16. Gaussian Process (GP) Priors
Adaptive Experimentation Practical Solutions to Exploration Problems 15 / 68
17. Gaussian Process (GP) Posteriors
Adaptive Experimentation Practical Solutions to Exploration Problems 16 / 68
18. Gaussian Process (GP) Posteriors
Adaptive Experimentation Practical Solutions to Exploration Problems 17 / 68
19. Gaussian Process (GP) Posteriors
GP regression gives well-calibrated posterior predictive intervals that are
easy to compute
Adaptive Experimentation Practical Solutions to Exploration Problems 18 / 68
20. Gaussian Process (GP) Regression
In practice, we find that GP surrogate models fit the data well for many
online experiments.
Adaptive Experimentation Practical Solutions to Exploration Problems 19 / 68
21. Other Examples with Continuous Action Spaces
• Value models governing ranking policies: e.g.
rank(Z) = x1P(click|Z) + x2Zx3
num friends + f(P(spam|Z)/x4) + ...
• Bit-rate controllers for video and audio streaming
• Data retrieval policies for ML backends
Question: How do we use GP surrogate models to guide the
explore-exploit trade-off?
Adaptive Experimentation Practical Solutions to Exploration Problems 20 / 68
28. Bayesian Optimization
Response surface is maximized sequentially
• Models tell us which regions should be considered for further
assessment
Adaptive Experimentation Practical Solutions to Exploration Problems 27 / 68
29. Bayesian Optimization
Algorithm 1 BayesianOptimization
1: Run N random initial arms
2: for t = 0 to T do
3: Fit GP model to data
4: Use acquistion function select candidates C
5: Evaluate C on black box function
6: Add new observations to dataset
7: end for
Adaptive Experimentation Practical Solutions to Exploration Problems 28 / 68
31. Alternatives
Random Search (Cheaper - 25 arms)
• Maxima can be deduced with only a few, smartly chosen arms
Adaptive Experimentation Practical Solutions to Exploration Problems 30 / 68
32. Competing Objectives
• Product teams are used to running an A/B test and observing the
outcomes.
• Often, there are multiple competing objectives
Adaptive Experimentation Practical Solutions to Exploration Problems 31 / 68
33. Competing Objectives
If we want full automation, we need to specify more information in
advance: ideally, ”the” scalarized objective
Adaptive Experimentation Practical Solutions to Exploration Problems 32 / 68
35. Competing Objectives
Decision makers don’t like scalarizations: e.g.
objective = −0.8 · cpu + 1.1 · time spent
Adaptive Experimentation Practical Solutions to Exploration Problems 34 / 68
36. Competing Objectives
Decision makers prefer constraints:
min(cpu) subject to time spent > 0.7
Adaptive Experimentation Practical Solutions to Exploration Problems 35 / 68
37. Practical Challenges
• Constrained optimization
• Observations often have high variance, leading to potentially large
measurement error
• High noise levels can degrade the performance of many common
acquisition functions including Expected Improvement
Adaptive Experimentation Practical Solutions to Exploration Problems 36 / 68
38. Solution
For more details, see
• Constrained Bayesian Optimization with Noisy Experiments Bayesian
Analysis 2019. Letham, Karrer, Ottoni, & Bakshy
Adaptive Experimentation Practical Solutions to Exploration Problems 37 / 68
39. Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 38 / 68
40. Value Model Tuning
• Ranking teams use value models, combine multiple predictive models
and features, e.g.
rank(Z) = x1P(click|Z) + x2Zx3
num friends + f(P(spam|Z)/x4) + ...
• Not feasible to run sufficiently powered experiments with 20+ arms,
so the team developed a simulator
Adaptive Experimentation Practical Solutions to Exploration Problems 39 / 68
43. Debiasing Simulations with Multi-Task Models
Adaptive Experimentation Practical Solutions to Exploration Problems 42 / 68
44. Debiasing Simulations with Multi-Task Models
Adaptive Experimentation Practical Solutions to Exploration Problems 43 / 68
45. Multi-Task Bayesian Optimization Loop
Algorithm 2 MultiTaskBayesianOptimization
1: Run N random arms online
2: Run M random arms offline with M > N
3: for t = 0 to T do
4: Fit MT-GP model to all data, with each batch as separate task
5: Use NEI to generate q candidates C (e.g. q = 30)
6: Run C on the simulator, fit GP model again
7: Use NEI to generate candidates to run online
8: end for
Adaptive Experimentation Practical Solutions to Exploration Problems 44 / 68
47. Paper
For more details, see
• See Bayesian Optimization for Policy Search via Online-Offline
Experimentation. Letham & Bakshy 2019. Forthcoming, arXiv
1904.01049
Adaptive Experimentation Practical Solutions to Exploration Problems 46 / 68
48. Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 47 / 68
57. Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 56 / 68
58. Video Upload Transcoding Optimization
Problem
• System receives requests to upload videos of different source qualities
and file sizes from a variety of network connections and devices.
• To ensure high reliability, a video may be transcoded to be uploaded
at a lower quality
• For each video upload request, we have features about
• the video: file size, duration, source resolution
• the network: country, network type, download speed
• the device
Goal
• Maximize quality preserved without decreasing reliability
Adaptive Experimentation Practical Solutions to Exploration Problems 57 / 68
59. Video Upload Transcoding - CB Problem
• Context: features about video, network, device
• Actions: 360p, 480p, 720p, 1080p
• Outcomes: reliability y(x, a)
• Rewards: ?? some function R(x, a, y)
Adaptive Experimentation Practical Solutions to Exploration Problems 58 / 68
60. Approach - Bandit Algorithmm
Thompson Sampling
• Works well in batch mode
• Hyper-parameter free exploration
• Always ”picks the best” codec: picks codecs with probability
proportional to it being the best
Adaptive Experimentation Practical Solutions to Exploration Problems 59 / 68
61. Approach - Modeling
Bayesian Linear Model
• Bernoulli likelihood to predict reliability
• Using a neural network feature extractor
• Simple two-layer MLP (50, 4) trained via SGD
• Last layer is a stochastic variational GP with a linear kernel
• Trained via stochastic variational inference using 1000 inducing points
according to space-filling design
Adaptive Experimentation Practical Solutions to Exploration Problems 60 / 68
62. Thompson Sampling
Algorithm 3 ThompsonSampling
Input: discrete set of actions A, distribution over models P0(f)
1: for t = 0 to T do
2: Sample model ˜ft ∼ Pt(f|X, y)
3: Select an action at ← arg maxa∈A E(rt|xt, a, ˜ft)
4: Observe reward rt
5: Update distribution Pt+1(f)
6: end for
Adaptive Experimentation Practical Solutions to Exploration Problems 61 / 68
63. Issues with Vanilla Thompson Sampling
• Thompson sampling does not account for the constraint
• Change in reliability must be non-negative
• Unclear how to optimally specify reward parameterization
Adaptive Experimentation Practical Solutions to Exploration Problems 62 / 68
64. Constrained Thompson Sampling
Algorithm 4 ConstrainedThompsonSampling
1: Input: discrete set of actions A, distribution over models P0(f)
2: for t = 0 to T do
3: Receive context xt
4: Sample model ˜ft ∼ Pt(f|X, y)
5: for a ∈ A do
6: Estimate outcomes ˜ft(xt, a)
7: end for
8: Fetch action under baseline policy b ← πb(xt)
9: Filter feasible actions: Afeas ← {a ∈ A| ˜ft(xt, a) ≥ ε · ˜ft(xt, b)}
10: Select an action at ← arg maxa∈Afeas
E(rt|xt, a, ˜ft)
11: Observe outcome yt
12: Update distribution Pt+1(f)
13: end for
Adaptive Experimentation Practical Solutions to Exploration Problems 63 / 68
65. Reward Shaping Setup
Reward Shaping:
• Reward is 0 if the upload is a failure
• Reward is fixed at 1 for a 360p upload success:
• Reward is monotonically increasing with quality:
R(y = 1, a) = 1 +
a ≤a
wa
where
wi ∈ (0.0, 0.2]
Safety Constraint: ε ∈ [0.95, 1.0]
Adaptive Experimentation Practical Solutions to Exploration Problems 64 / 68
66. Reward Shaping Optimization
• Teams care about top-line outcomes:
• Reliability: mean reliability per user
• Quality preserved: mean quality (e.g., 1080p preserved, HD) per user
• Other outcomes: watch time, content production
• Difficult to evaluate these outcomes from purely offline data
Solution: Use Bayesian Optimization (via Ax) using online experiments
Adaptive Experimentation Practical Solutions to Exploration Problems 65 / 68
67. Reward Shaping Optimization
(a) 1080p quality preserved (b) Reliability
Figure: GP-modeled response surface of mean percent change in video quality
and reliability relative to the baseline policy. Each point represents a policy
parameterized by reward function hyperparameters and constraint parameter ε.
Adaptive Experimentation Practical Solutions to Exploration Problems 66 / 68