Optimizing Budget Constrained Spend in Search Advertising

Optimizing Budget Constrained Spend in Search
Advertising

∗
Chinmay Karande Aranyak Mehta Ramakrishnan Srikant
chinmayk@fb.com aranyak@google.com srikant@google.com

Google Research
Mountain View, CA, USA

ABSTRACT Each advertiser also specifies a daily budget, which is an
Search engine ad auctions typically have a significant frac- upper bound on the amount of money they are prepared to
tion of advertisers who are budget constrained, i.e., if al- spend each day. While many advertisers use bids as the pri-
lowed to participate in every auction that they bid on, they mary knob to control their spend and never hit their budget,
would spend more than their budget. This yields an im- there exists a significant fraction of advertisers who would
portant problem: selecting the ad auctions in which these spend more than their budget if they participated in ev-
advertisers participate, in order to optimize different system ery auction that their keywords match. Search engines of-
objectives such as the return on investment for advertisers, ten provide an option to automatically scale the advertiser’s
and the quality of ads shown to users. We present a sys- bids [1, 2], but a substantial fraction of budget constrained
tem and algorithms for optimizing such budget constrained advertisers do not opt into these programs. For these ad-
spend. The system is designed be deployed in a large search vertisers, the search engine has to determine the subset of
engine, with hundreds of thousands of advertisers, millions of auctions the budget constrained advertiser should partici-
searches per hour, and with the query stream being only par- pate in. This creates a dependence between auctions on
tially predictable. We have validated the system design by different queries, and leads to essentially a matching or an
implementing it in the Google ads serving system and run- assignment problem of advertisers to auctions.
ning experiments on live traffic. We have also compared our In this paper, we consider the problem of optimized budget
algorithm to previous work that casts this problem as a large allocation: allocating advertisers to queries such that budget
linear programming problem limited to popular queries, and constraints are satisfied, while simultaneously optimizing a
show that our algorithms yield substantially better results. specified objective. Prior work in this area has often chosen
revenue as the objective to optimize. However, the long term
revenue of a search engine depends on providing good value
to users and to advertisers. If users see low quality ads,
Categories and Subject Descriptors then this can result in ad-blindness and a drop in revenue.
G.1.6 [Optimization]: Linear Programming; K.6.0 [General]: If advertisers see low return on investment (ROI), then they
Economics will reduce their bids and budgets, again resulting in a drop
in revenue. Thus, we explore two other objectives in this
paper: improving quality, and advertiser ROI.
1. INTRODUCTION
Search ad auctions have emerged as the primary model for Paper Outline We describe the problem of optimized bud-
monetizing the value provided by search engines. Advertis- get allocation in Section 2, followed by related work in Sec-
ers use phrases (keywords) to specify the set of queries they tion 3. We present our algorithms and system design in Sec-
are interested in, and bid the cost they are prepared to pay tion 4, and discuss certain key properties of the algorithm in
per click on their ad. For each search query, the set of ads Section 5. In Section 6, we show that our algorithms yield
to show, the order in which they are shown, and the cost substantially better results than prior work and also describe
per click for each shown ad are determined via an auction. the results of experiments on live traffic. We conclude with
some closing thoughts in Section 7.
∗The author is currently at Facebook, Inc., Menlo Park, CA,
USA. The work described in this paper was done while the
author was at Google. 2. PROBLEM DEFINITION
Let A be the set of advertisers, and Q the set of queries.
Permission to make digital or hard copies of all or part of this work for
Each advertiser a ∈ A comes with a daily budget Ba . Let
personal or classroom use is granted without fee provided that copies are G(A, Q, E) be a bipartite graph such that for a ∈ A and
not made or distributed for profit or commercial advantage and that copies q ∈ Q, edge (a, q) ∈ E means that an ad of a is eligible for
bear this notice and the full citation on the first page. To copy otherwise, to the auction for query q (a’s keywords match q). Let ctr(a,q)
republish, to post on servers or to redistribute to lists, requires prior specific be the probability of a click on a’s ad for q, and bid(a,q) be
permission and/or a fee.
WSDM’13, February 4–8, 2013, Rome, Italy.
the amount a is willing to pay per click. (Note that ctr(a,q)
Copyright 2013 ACM 978-1-4503-1869-3/13/02 ...$15.00. is the position-independent CTR, i.e., the probability of a

click at some chosen fixed position. In other words, ctr(a,q) or advertiser clicks per dollar. Thus their approach yields
does not depend on the position of the ad.) an optimal solution to the click maximization problem in
When a query q arrives, the eligible ads a for q are ranked the general GSP setting. However, there are two reasons
by bid(a,q) ctr(a,q) and shown in that order. Denoting the why this work is not the final word on the allocation ap-
jth ad in the order as aj , the cost per click of aj is set as proach. First, the LP can only run over head queries due
to resource constraints, which brings up an interesting ques-
cpc(aj ) = bid(aj+1 ,q) ctr(aj+1 ,q) /ctr(aj ,q)
tion: Can a non-optimal algorithm that runs over the entire
This is known as the generalized second price (GSP) auction query stream beat an optimal algorithm that is restricted to
(see, e.g., [25, 14, 6]). the head? Second, the LP formulation can yield solutions
Let Ta denote the spend of the advertiser a if a partici- that are clearly unfair, in that the allocation for some adver-
pates in all the auctions for which a is eligible via the key- tisers is very different from what the advertiser would choose
word match (ignoring a’s budget). If Ta > Ba , the advertiser for themselves for that objective (see Section 5). Hence it
is budget constrained, and the search engine has to limit (or is unclear whether LP solutions can be deployed by search
throttle) the set of auctions in which the advertiser partici- engines.
pates. A second stream of work focused on optimizing search en-
gine revenue in a first price auction. Mehta et al. [22] and
Objectives. For budget constrained advertisers, the search
Buchbinder et al. [9] both provide an online approximation
engine can select to optimize different objectives such as:
algorithm for optimizing revenue, with a best possible ap-
• the quality of the ads shown, e.g., maximize the position- proximation guarantee of 1 − 1/e 63%, for the scenario in
independent predicted click-through rate, which we do not know anything about the future distribu-
• reduce advertiser cost-per-click (maximize the number tion of queries. In contrast, we assume that we can predict
of clicks), or future distributions of queries, albeit noisily. Mahdian et
al. [21] extended the algorithm from [22] to provide guaran-
• increase advertiser ROI. tees in the setting when we have unreliable estimates. [19,
We do not have full knowledge of the graph G, but only 13] analyzed revenue maximization in an average case set-
information from past data, i.e., the graphs from previous ting where queries arrive in a random order, or are picked
days. We also require practical algorithms which work on- i.i.d from a distribution. All the above papers focus solely
line, i.e., given the next query, decide, in sub-second re- on search engine revenue, assume a first price auction, and
sponse time, which ad impressions to show. We can now do not consider multiple slots or positions. We focus on op-
define the problem as follows. timizing very different objectives, such as user experience
and advertiser ROI, in the second price GSP auction, with
The Optimized Budget Allocation Problem: Given
multiple slots and positions.
information about the past as G (A, Q , E ) including the
There has been considerable related work in budget allo-
bids and CTR for each advertiser-query pair, the budget
cation for display ads, e.g., [11, 12, 15, 16, 26]. This work
Ba , for each advertiser a, and a specified objective function
also does not consider second price auctions, or the objec-
to optimize: For each query arriving in an online stream of
tives we study.
queries Q, decide which advertisers should participate in the
There has also been work on designing new incentive com-
auction.
patible auctions in the presence of bidders with budget con-
straints, initiated by [4, 8]. Our goal is to optimize budgets
3. RELATED WORK in the context of the GSP auction used by search engines,
There have been two broad approaches to optimizing bud- and hence we do not consider alternate auction designs.
get constrained spend: allocation and bid modification. Al- Bid Modification The second approach [7, 10, 17, 20, 24]
location treats bids as fixed, and allows only decisions about studies the advertiser’s problem of bidding optimally in or-
whether the advertiser should participate in the auction. der to maximize ROI (or some notion of utility). Changing
This is our setting, where we are constrained to not change bids in such a manner is an alternate solution to dealing with
bids, but only optimize allocations. budget constrained advertisers: if the advertiser permits, the
The second approach, bid modification, is in a setting search engine can scale bids down until the advertiser is no
where bids can be changed. This body of work typically longer budget constrained. However, despite the availability
considers the problem from the advertiser’s perspective, and of a free option to automatically scale bids [1], a substantial
assumes full knowledge of the information that advertisers fraction of budget constrained advertisers have not opted in
typically have (such as the value of clicks or conversions). to use this product. For these advertisers, the search engine
However, this work can also be adapted to be applicable (as a policy decision) has to use the allocation approach,
from a search engine’s perspective, for advertisers who have not bid modification. Hence the work on bid modification,
opted in and allow the search engine to change their bids. while clearly an appealing alternative, is not applicable in
We next describe the related work in the allocation and our setting.
bid modification approaches. Despite the lack of applicability, it would be interesting to
Allocation The paper by Abrams et al. [5] is the closest understand how optimized budget allocation fares against
to our work. They solve the offline problem (with complete bid scaling. We compare our algorithms to bid scaling in
knowledge of future query frequency) for head queries (the Section 6.
most popular search queries) in the GSP auction setting us-
ing a linear program (LP), to optimize search engine revenue

4. ALGORITHMS To achieve this goal, our algorithm uses a third input,
Given the problem of optimizing budget constrained spend, the rank Rθ,a of an impression for a given advertiser a and
the first step is to neither over- nor under-spend.1 The naive metric θ. Define
way to do this is to let advertisers participate in auctions un- • Fθ,a (µ): Estimated fraction of maximum spend Ta for
til they hit their budget, and then make them ineligible for which θ(i) ≤ µ. In other words, F is the estimated
the rest of the day. Clearly, this will yield very biased traffic cumulative distribution function of θ.
to the advertiser, and also skew auction competition towards
the earlier part of the day. The next simplest approach, • Rθ,a (µ) = 1 - Fθ,a (µ).
which does not have these drawbacks, is Vanilla Probabilis- The lower the value of the rank R, the better the impression
tic Throttling (VPT). scores on our metric.
For each advertiser a, define: We now define the Optimized Throttling algorithm:
• Ba : the remaining budget for the day (or time period).
For each arriving query q:
• Ta : the remaining maximum spend for the rest of the
For each budget constrained advertiser a:
day, i.e., the total spend if the advertiser had unlimited
If Rθ,a (θ(i)) ≤ Ba /Ta , then a
budget.
participates in the auction.
We now define the Vanilla Probabilistic Throttling al-
gorithm: Figure 2: Algorithm OT

For each arriving query q: While the algorithm appears to be a straightforward greedy
For each budget constrained advertiser a: algorithm, there are several subtleties:
Flip a coin with P [Heads] = Ba /Ta .
• The algorithm yields a solution that is “fair” to each
If heads, a participates in the auction.
advertiser (formally defined, and proved in Section 5).

Figure 1: Vanilla Probabilistic Throttling (VPT) • The algorithm yields an optimal fair solution under
certain constraints. While we can find many examples
If our estimate of Ta is accurate, then each advertiser where the constraints don’t hold, the constraints hold
spends very close to her budget in expectation (and with often enough that the algorithm is not too far from
high probability, by Chernoff bounds). Advertisers also re- optimal in practice.
ceive a representative sample of the traffic they are eligible • We have transformed the domain from the space of
for. queries to the distribution of some property of queries.
Predicting the frequency of queries in the tail is in-
4.1 Optimized Throttling tractable. Predicting the distribution of specific prop-
We now present algorithms for optimizing one or more of erties of the queries in the tail is very tractable (for
the following objectives: properties we use in our algorithms).
• average quality of ads shown, represented by CTR. • The choice of metric lets us optimize a wide range
• clicks per dollar, of objectives, or combinations of objectives (discussed
next).
• conversions per dollar,
We now define five instantiations of OT, corresponding
• the advertiser’s profit, using the difference between the to the four objectives we listed earlier, and a fifth objective
bid and the cost per click as the estimate of profit. that combines quality and clicks:
For the chosen objective, given a candidate ad impression i
(i.e., for a specific query and advertiser), we compute a met- Objective θ(i)
ric θ(i) which tracks the desired objective. Given a choice OT-CTR Ad quality ctri
between two impressions (for the same advertiser), we would OT-Clicks Clicks 1/cpci
prefer to show the impression with the higher value of the OT-Profit Profit (bidi − cpci )/cpci
metric. For example, if the objective is quality, the metric OT-Conversions Conversions cvri vali /cpci
could be the position-independent predicted click-through ctri (bidi − cpci )
OT-CTR-Profit Blend
rate of the advertiser for this query, reflecting our desire to cpci
show higher-quality impressions. Conceptually, we would
like to rank all the impressions for an advertiser by the de- Figure 3: Instantiations of OT
sired metric, and choose the impressions with the highest
metric score until the budget is filled. The first metric, ctri is straightforward: we are using
1
position-independent predicted CTR as a proxy for qual-
While most advertisers are clearly budget constrained or ity. (Of course, one could use any arbitrary quality metric
unconstrained, advertisers at the margin may switch back-
instead of CTR.)
and-forth between the two states, based on traffic. It is
straightforward to handle these marginal cases. For ease of To understand the next three metrics, it is helpful to
exposition, we ignore this issue in the paper; however our multiply the numerator and denominator by the position-
implementation does handle these cases gracefully. dependent predicted CTR. For example, in OT-Clicks, the

numerator now becomes the expected number of clicks, and (purely for ease of exposition). Since a search engine has a
the denominator the expected cost, and the ratio is the clicks large number of advertisers, we would like to compress the
per dollar. Since the total spend is fixed for budget con- information in R at serving time. We compress R into a
strained advertisers, optimizing clicks is equivalent to opti- histogram H.
mizing clicks per dollar. 2 Recall that R is only used to answer the question of whether
For OT-Profit, assuming that bidi is the value to the R(θ(i)) is less than ip (= Ba /Ta ). So if the estimate of ip
advertiser, bidi −cpci is the expected profit to the advertiser was very stable, we need just two buckets in the histogram
if there is a click on this impression. So the metric for OT- H, with the boundary at the value c∗ such that R(c∗ ) = ip.
Profit is the expected profit per dollar of spend. With two buckets, we would need just 8 bytes of data per
For OT-Conversions the metric is the expected conver- budget constrained advertiser: the value c∗ and the value of
sion value per dollar, given the value of a conversion on this R(c∗ ).
impression vali , and a model that predicts the conversion We may choose to create additional buckets around the
rate cvri . Building machine learning models for estimat- threshold c∗ , based on the tradeoff between increased gains
ing cvri is beyond the scope of the paper. However, we from choosing the highest scoring impressions (see below)
will note the existence of commercial systems that estimate versus memory constraints. For each bucket m in H ex-
cvri given advertiser-specific conversion data, e.g., [2]. Ad- cluding the last bucket, we store the bucket boundary and
vertisers do not necessarily need to provide conversion data the value of R at the bucket boundary. We refer to these
in order to benefit from the techniques in the paper. One histograms as throttling parameters.
can build models for estimating conversion rate that are not Using the estimates at serving: When a query arrives,
advertiser-specific, e.g., we found that ctri is correlated with we need to determine, for each budget constrained advertiser
cvri . a, whether a participates in the auction. The input consists
The final metric simply multiplies the metric for CTR of the histogram H, the current value of ip, and the value
and profit. The intuition is that for advertisers with a lot of of θ(i) for the current impression i. Let θ(i) be in bucket
variance on CTR but not much on profit, the algorithm will m. Let Hb (m) denote the upper boundary of bucket m, and
focus on CTR. Similarly for advertisers with more variance let Hr (m) = R(Hb (m)). Then the advertiser participates in
on profit than CTR, the algorithm will focus on profit. Thus the auction with the following probability:
if the search engine cares about both CTR and advertiser
if Hr (m) ≤ ip

profit, the blended metric will likely yield better results than  1,
simply averaging the results of the individual metrics. 0, if Hr (m − 1) ≥ ip
ip−Hr (m−1)
, otherwise

Hr (m)−Hr (m−1)
4.2 System design and implementation
The first two cases are straightforward, and follow directly
We next describe our implementation of the algorithm in
from the goal that we (do not) show the impression if it is
the production Google ads serving system. Our system has
(not) in the top ip fraction of spend. In the third case, the
three primary components:
probability that the impression is in the top ip fraction of
• estimating Ba /Ta , spend is given by (ip − Hr (m − 1))/(Hr (m) − Hr (m − 1)),
and hence we show the impression with that probability.
• estimating and compressing Rθ,a , and
Implementation: We have built our data collection pipeline
• using the estimates at serving time. on top of Google’s sawzall [23] infrastructure, which allows
We will use OT-CTR to illustrate the techniques used, and us to process historical query data in parallel. The throt-
discuss any differences between OT-CTR and the other in- tling parameters generated by the data collection pipeline is
stantiations as they come up. written to the Google File System (GFS) [18]. The data is
stored in the protocol buffer format [3], which reduces the
Estimating Ba /Ta : This component estimates the impres- storage requirement as well as make the transfer and pro-
sion probability ip, where ip is defined as Ba /Ta . In other cessing of the data more efficient. From GFS, the throttling
words, ip is the probability with which we should allow an parameters data is picked up by the ads data push system,
impression of the advertiser to participate in an auction, which writes it to one of its data channels. The ads serving
in order to show her impressions uniformly through the re- system gets the updated throttling parameters through this
mainder of the day, and exhaust her budget at the end of the channel.
day. We estimate ip using traffic information from the past,
and using the available budget. Given the inherent noise in Interactions between budget constrained advertis-
traffic, a feedback loop is used at frequent intervals to adjust ers: Out of the metrics in Figure 3, ctri , bidi , cvri and
the probability. Clearly, the more accurate the estimate of vali are all functions purely of the impression, and do not
ip, the more the gains from optimization. change based on the other ads in the auction. However,
cpci is a function of the runner-up, since we use a second-
Estimating and compressing Rθ,a : We use historical price auction. We analyzed the logs, and found that budget
data to compute the cumulative distribution function Fθ,a . constrained ads are more likely to be next to budget uncon-
Rθ,a is a trivial transformation of Fθ,a . In the rest of this strained ads than budget unconstrained ads. However, we
section, we will drop the subscripts and refer to Rθ,a as R will have instances with consecutive budget constrained ads.
2
Some metrics like ctri are purely a function of the impres- We use an iterative technique for improving the performance
sion. However, metrics like cpc may change based on the of the online algorithms in these cases.
other ads in the auction. We discuss this issue in Section 4.2. The intuition behind the iterative technique is to run sev-

eral (simulated) iterations of the auction. In each iteration,
we compute the metric θ for each budget constrained adver- • Rank the impressions i ∈ Ea in order of decreas-
tiser (including those that do not participate in the auction ing 1/cpci .
in the current iteration), and based on the value of the met- • Pick the top impressions in Ea according to this
ric, decide whether that advertiser participates in the next ranking until the budget runs out, i.e. the largest
iteration. Note that an impression may be removed in one prefix Ia , s.t. i∈Ia spendi ≤ Ba , and at most
iteration and re-enter in a subsequent iteration. While we one additional fractional impression to finish the
cannot prove convergence, in practice this often converges budget.
in a few rounds, or at least leads to an improved solution
over simply using the value of θ from the first iteration.
Figure 4: Offline-OT-Clicks-Single-Advertiser
5. KEY PROPERTIES OF ALGORITHM
The overall effectiveness of the algorithm depends on two Without fractional impressions, this is the integral knap-
factors: the accuracy of the prediction of future traffic (total sack problem. Since one click is a tiny fraction of adver-
traffic and the distribution of the metric), and the intrinsic tiser spend, we allow the choice of one fractional impres-
effectiveness of the algorithm. To understand the latter, we sion, thereby converting the problem to a fractional knap-
analyze the algorithm on the offline version of the problem, sack problem. Observing that spendi = αi ctri cpci and
in which the advertiser-query graph is known. We will focus therefore αi ctri /spendi = 1/cpci , we get the algorithm in
on the instantiations with a linear objective: OT-Clicks, Figure 4, which is a simple greedy algorithm, using the ratio
OT-Profit and OT-Conversions. For non-linear objec- of the expected value from the click to the cost of the click.
tives such as CTR, optimality with even a single budget con- Theorem 1 follows from the well known optimality of the
strained advertiser may require allocation in a manner that greedy strategy for the fractional knapsack problem, and its
is clearly against the advertiser’s interest. So algorithms like proof is omitted here.
OT-CTR which are designed to both be good for advertisers
and improve quality cannot be optimal. Theorem 1. Offline-OT-Clicks-Single-Advertiser com-
We first consider the special case of a single budget con- putes an optimal solution to the ROI maximization problem
strained advertiser, and show that our algorithm is optimal for a single budget constrained advertiser.
for linear objectives. We then discuss the issue of fairness
when there are multiple budget constrained advertisers, and 5.2 Fair Allocations
show by example that linear programming can yield solu- We begin with the following definition of an optimal allo-
tions that are optimal but not fair. When we restrict the cation:
space of solutions to fair solutions, we show that our algo-
Definition 1. We call an allocation optimal if it maxi-
rithm yields an optimal solution even with multiple budget
mizes
constrained advertisers, as long as there are no adjacent bud-
get constrained advertisers in a given auction. Obviously, a∈A wa i∈Ia αi ctri
we do get adjacent budget constrained advertisers in the a∈A wa
real world – but the optimality result with that constraint
over all possible allocations (given the wa ≥ 0, which are
suggests that the algorithm will perform well in practice.
arbitrary advertiser specific weights).
For ease of exposition, we will choose the simplest instan-
tiation, OT-Clicks as the representative algorithm in the However, this definition has the problem that in trying to
proofs. It is straightforward to sketch out similar proofs for maximize the weighted average of advertiser ROI, we may
OT-Profit and OT-Conversions. end up sacrificing the interests of some advertisers, as the
following example illustrates.
5.1 Optimal For A Single Budget Constrained
Advertiser Example 1. There are two budget constrained bidders,
We start by proving that given G(A, Q, E) and a single a and b, each with a budget of $100. There are two different
budget constrained advertiser a ∈ A, OT-Clicks maximizes queries q1 and q2 , each with 100 instances. q1 has a mini-
clicks per dollar for a. While the proof is obvious, it is useful mum reserve cpc of $1, and q2 has a reserve of $2 (one may
as a building block to more interesting results. replace the reserves by an unconstrained bidder, keeping the
Given an advertiser a, we define: example unchanged). The bidders bid the following values
• Ea to be the set of impressions in queries where a is for the queries (a does not bid on q2 ):
eligible to participate in the auction, and
• Ia ⊆ Ea to be the set of impressions where a partici- q1 q2
pates in the auction. a 20 −
The total expected clicks is i∈Ia αi ctri , where αi is the b 10 10
position normalizer. For fixed budget, maximizing clicks min 1 2
per dollar is the same as maximizing total clicks, which is
captured by the following linear program: Given a, find For ease of exposition, we assume that the CTR is equal
Max αi ctri , s.t. spendi ≤ Ba (1) for all advertisers and query pairs, in both positions. To
Ia ⊆Ea maximize total clicks, or equivalently, clicks per dollar, the
i∈Ia i∈Ia

optimal solution is to let a participate in q1 , and b in q2 .
Then a gets 100 clicks and b gets 50, giving a total of 150 1. Begin with an allocation with only budget uncon-
clicks at a cost-per-click of $1.33. But this solution is not strained advertisers.
fair to b, who would rather show for q1 and get a cpc of $1 2. For each budget constrained advertiser a ∈ A (in
and hence 100 clicks instead of 50. In this scenario a would turn):
get only 10 clicks at a cpc of $10, giving a total of 110 clicks –Run Offline-OT-Clicks-Single-Advertiser for a.
at an average cpc of $1.81. 2 –Update I.

Example 1 motivates the following definition: 3. If the allocation has not converged, go to step 2.

Definition 2. We define an allocation I ⊆ E to be a fair
allocation for the clicks objective, if ∀ a ∈ A: Figure 5: Algorithm Offline-OT-Clicks.

a. The expected spend of a is equal to Ba (a exhausts its
budget), or Ia = Ea (we show every impression of a it due to the insertion or removal of other budget constrained
is eligible for). impressions). As an aside, this key property also holds for
first price auctions, and the following results also hold for
b. Given the allocation of other advertisers (i.e. IIa ), Ia
first price auctions.
maximizes the total expected clicks that a can obtain
Given this property, for each advertiser a, the 1 / cpc or-
within its budget.
dering of the impressions in Ea is fixed throughout the al-
We call an allocation I an optimal fair allocation if it is gorithm, independent of the allocations of the other adver-
a fair allocation, and it maximizes tisers. From the definition, it is easy to see that every fair
allocation has the property that for each a ∈ A, Ia is a prefix
a∈A wa i∈Ia αi ctri
of this fixed ordering of Ea , since a non-prefix would violate
a∈A wa Property (b) in Definition 2. By definition of the algorithm,
among all fair allocation (where the wa ≥ 0 are arbitrary this is true also for the allocations produced during every
advertiser specific weights). step of the algorithm. We call allocations with this property
prefix-allocations.
We can similarly define fair allocations and optimal fair
allocations for the profit objective and the conversions ob- Lemma 2. Algorithm Offline-OT-Clicks converges to a
jective. fair allocation if there are no adjacent constrained bidders
In Example 1, the allocation which maximizes the sum in any query.
(giving 150 clicks) is not a fair allocation, while allocating Proof. Fix an advertiser a, and consider the allocation
both a and b to q1 is (even though this reduces the total to a in each round. We claim that the prefix chosen for
clicks to 110). We note that our definition of fair allocation a only increases in length in subsequent rounds. Since for
is analogous to that of a Nash equilibrium in games. each advertiser the prefix cannot increase indefinitely, this
means that the algorithm converges to some allocation, say
5.3 Optimal Fair Allocation When No Adja- I ∗ . Property (a) in Definition 2 holds for I ∗ because of the
cent Budget Constrained Advertisers termination condition of Offline-OT-Clicks. Property (b)
When there are multiple budget constrained advertisers, holds by definition of Offline-OT-Clicks-Single-Advertiser,
a natural local algorithm is to cycle over the different ad- which is run in every round of Offline-OT-Clicks.
vertisers until convergence, running Algorithm Offline-OT- It remains to prove the claim that the prefix for a can only
Clicks-Single-Advertiser for each advertiser a. In the re- increase in each round. Suppose this is true up to the time
mainder of this section we analyze this algorithm, which is we process advertiser a in round k. In between the times
defined in Figure 5. we process a in rounds k and k + 1, the algorithm may have
In a GSP auction, there are two ways in which the intro- introduced impressions of other advertisers in the auctions
duction or removal of an impression i of one budget con- for which we show a’s impressions in round k. The only
strained advertiser can effect an impression i of another: effect this can have on a’s impression is to possibly lower
its position, and therefore of its expected spend. Thus, in
• i can change the cpc of i round k + 1, the algorithm may need to pick a longer prefix
to finish a’s budget.
• i can change the position of i , and hence the expected
spend from i . For two prefix-allocations I, J, we say I J if for every
advertiser a ∈ A, the prefix length in I is at most the prefix
Due to this interaction, it is unclear whether Algorithm
length in J, and therefore Ia ⊆ Ja .
Offline-OT-Clicks always yields an optimal fair allocation.
However we will show that it does converge to a optimal fair Lemma 3. Let I ∗ be the fair allocation that Algorithm
allocation when there are no adjacent budget constrained Offline-OT-Clicks converges to (when there are no adja-
bidders, i.e., in the auction ranking of all eligible bidders, cent constrained bidders). Then
there are no consecutive budget constrained bidders. The
I∗ I, for all fair allocations I
key property is that in such a scenario, the cpc of a budget
constrained impression is independent of other budget con- Proof. If this is not true for some fair allocation I, then
strained impressions (even though the position may change consider the first time during the run of the algorithm that

some advertiser a’s prefix becomes longer than its prefix
in I. Comparing to I, the algorithm’s current allocation
max αsi ctrqsi xqs
has all advertisers a = a with smaller or equal prefixes.
q,s,i
Thus the position normalizers of a’s impressions are larger
or equal during this step of the algorithm than in the al- Allocation constraint: xqs ≤ Nq ∀q
location I. This implies that the prefix of a in the current s

allocation should be shorter or equal than that in I, since Budget constraint: cpcqsi αsi ctrqsi xqs ≤ Bi ∀i
the expected spend in each auction is at least that in I, a q,s
contradiction.
One can optimize for other linear metrics, such as rev-
Note that Lemma 3 implies that there is a unique -minimal enue by suitably changing the objective function. It is not
fair allocation, and that the algorithm converges to it. possible to directly optimize for non-linear metrics such as
CTR.
Theorem 4. Algorithm Offline-OT-Clicks converges to
The advantage of LP (compared to our approach) is that
an optimal fair allocation if there are no adjacent constrained
the LP fully incorporates interactions between different bud-
bidders in any query.
get constrained advertisers. However, even the most efficient
Proof. From Lemma 2 we know that the algorithm con- LP solvers cannot solve linear programs on the volume of
verges to a fair allocation I ∗ . From Lemma 3 we get that data a search engine sees in a day. Thus we have to limit
for every advertiser a, and every fair allocation I, a’s pre- LP to the head portion of query traffic, and use an approach
fix in I ∗ is no longer than its prefix in I. Thus a spends such as Vanilla Probabilistic Throttling on the long tail.
the same amount of money (in expectation) in I ∗ as in I,
but spends it on a subset (I ∗ )a ⊆ Ia such that impressions 6.1.2 Bid Scaling
in Ia (I ∗ )a have lower (or equal) ratios of value of click to Both the OT algorithms and the LP formulation work un-
cost of click, as impressions in (I ∗ )a . This implies that a der the constraint that the bidder’s inputs (bids) can not be
gets at least as much total expected value in I ∗ as in I, in changed by the search engine, but they can only be throt-
turn implying that I ∗ maximizes any weighted average (over tled. Bid Scaling algorithms go outside this design space by
advertisers) of the total expected value obtained, among all lowering the bids of the constrained bidders appropriately.
fair allocations. As we discussed in Section 1, bid scaling is not an option for
our problem. Nevertheless, it is interesting to compare our
As mentioned earlier, it is easy to prove similar statements
approach against bid scaling.
about other linear objectives such as optimizing profit and
Our bid scaling algorithm finds one bid multiplier per bid-
optimizing conversions.
der and applies it to all the advertiser’s bids, similar to the
Budget Optimizer product in Google Adwords [1]. The mul-
6. EMPIRICAL EVALUATION tiplier is calculated so as to spend exactly the bidder’s bud-
We report two types of experiments below, offline simu- get.
lations to compare different optimized allocation solutions,
and live experiments on Google traffic. We start with a 6.2 Simulation Methodology
description of prior approaches: LP and BidScaling. We conducted simulations using a 20% sample of all US
queries made over a week to the Google search engine. We
6.1 LP and BidScaling sorted queries by the total impressions for that query (summed
over all query instances), and picked the queries with the
6.1.1 Linear Programming most impressions as the “head” queries for the LP. The num-
Assuming complete knowledge of the data, a theoretical ber of queries in the head was chosen so that the LP could
benchmark for any budget allocation algorithm can be ob- run in memory. (Note that as we move from the head to the
tained via linear programming [5]. For any query q, let s tail, each query has relatively few instances. So even dou-
denote the set of bidders chosen to participate in the auction bling the memory will not substantially increase the fraction
by an allocation mechanism. Let C(q) be the collection of all of revenue or clicks covered by the “head” queries.) For each
such sets, generated by any conceivable algorithm. Clearly, set (head and tail), we computed an appropriate budget for
we can completely specify an allocation policy by specifying that set by scaling down the total budgets. For each query
for each query q, the set s ∈ C(q) of bidders permitted by we also have the candidate ads together with the relevant
the algorithm to participate in the auction for q. metrics, namely, the bid and the predicted CTR. Our sim-
We can then attempt to discover the best allocation policy ulation is offline, i.e., the set of queries and candidates is
(for a given linear objective, say click maximization) using fixed. Since all the algorithms we simulate are time inde-
linear programming. Let Nq be the number of times query q pendent (as opposed to some of the bid scaling algorithms
appears in the data and xqs be the number of times the set studied earlier, e.g., [22, 9, 13, 16, 15]), we do not need to
s of bidders is selected for query q. For a bidder i ∈ s, let worry about the time-arrival order of the queries.
cpcqsi and ctrqsi be the cost-per-click and click-through rate We simulated the following throttling algorithms for our
for i in the auction for q. Let αsi be the position normalizer first set of comparisons:
for impression i. Let Bi be the budget of bidder i. Then,
1. VPT: Vanilla Probabilistic Throttling.
we can discover the best allocation policy that maximizes
total clicks provided to budget constrained advertisers as 2. OT-Clicks: Optimized Throttling, objective is clicks
the solution of the following linear program: (or inverse of cpc).

3. OT-CTR: Optimized Throttling, objective is CTR.
4. BidScaling, as described in Section 6.1.2
5. LP-Clicks: Click maximizing Linear Program on head
queries.

6.3 Results
We show the changes in various objectives relative to the
baseline of Vanilla Probabilistic Throttling (VPT). It is im-
portant to note that while we expect the overall conclusions
to carry over to an online setting where the query distribu-
tion changes over time, the exact numbers will change. In
general, the gains from optimized budget allocation or bid
scaling will be significantly lower in live experiments due to
changes in query traffic. For this reason, as well as data Figure 6: Impact on clicks-per-dollar, over budget
confidentiality, we omit the scale from our graphs below. constrained campaigns. The baseline is VPT.
6.3.1 Comparison with LP
Figure 6 shows the change in clicks per dollar for budget point, all figures represent performance using all queries,
constrained advertisers for each of the algorithms. The first not just the head.) Here OT-Clicks does much worse than
set of numbers, “head”, show the results when we artificially BidScaling, though it is still positive. In fact, even OT-
restrict all the algorithms to operate over the same set of CTR beats OT-Clicks. OT-Profit, which attempts to
head queries as LP-Clicks, with VPT on the tail. Since optimize profit, yields similar results to BidScaling.
LP-Clicks is not just optimal, but can also generate solu- As discussed in the introduction, user experience (qual-
tions that are not fair (unlike the other algorithms), it is not ity of ads) is as important as advertiser ROI for long-term
surprising that LP-Clicks outperforms the alternatives. success. At first glance, one might expect that optimizing
However, when we allow the algorithms to optimize over clicks per dollar would yield similar results to optimizing
the entire dataset – the “all” numbers – the algorithms that CTR: doesn’t a higher click-through rate mean more clicks?
can use the full data dramatically outperform LP-Clicks. However, what matters for optimizing clicks per dollar (with
In fact, even OT-CTR, which is optimizing CTR and not a fixed budget) is the cost per click, not the click-through
CPC, yields a higher drop in CPC (or equivalently, more rate.
clicks per dollar) than LP-Clicks. The reason for the poor Figure 8 shows the change in CTR for each of the al-
performance of LP-Clicks is that the LP can be run only gorithms. OT-CTR dramatically outperforms all other al-
on the head, and even though the head queries account for gorithms, not surprising since it is the only algorithm ex-
a substantial portion of revenue, they are relatively homo- plicitly trying to optimize quality. Interestingly, while both
geneous – the potential gains from optimization are more OT-Profit and BidScaling gave similar gains in profit-
in the tail than the head. We found that this held for the per-dollar, their effect on quality is quite different: OT-
other metrics as well, i.e., the substantial majority of the Profit improves CTR, while BidScaling reduces CTR.
gains from optimization came from the tail queries.3
6.3.3 Multiple Objectives
6.3.2 Comparison with BidScaling We now present results with metrics that blend the CTR
The other interesting comparison in Figure 6 is between and profit objectives. In Section 4 we had conjectured that
OT-Clicks and BidScaling. BidScaling performs slightly blended metrics might yield better results than individual
better than OT-Clicks when restricted to head queries, as metrics, since different advertisers may have better scope
many advertisers may appear for a relatively small number for optimization along different dimensions. Figures 9 and
of queries in the head. Thus OT-Clicks, which doesn’t have 10 show the impact of two blended metrics: ctr(bidi −
the flexibility to scale bids, has a bit less room to maneu- cpci )/cpc, and ctr2 (bidi − cpci )/cpc. Notice that OT-
ver. Over all queries, OT-Clicks has much more scope to CTR-Profit, which uses the former as the metric, almost
differentiate between queries, and hence does slightly better matches OT-Profit on profit-per-dollar, while yielding sig-
than BidScaling. nificantly higher gains in CTR than OT-Profit. OT-CTR2-
However, OT-Clicks may be getting the gains by drop- Profit further increases CTR gains, for a bit more drop in
ping high bid, high cpc clicks which might still yield more profit-per-dollar. In addition to validating our conjecture
profit for the advertiser than low bid, low cpc clicks. Figure 7 that blended metrics may yield better results, such blended
shows how the algorithms do on estimated profit-per-dollar: metrics let the search engine pick any arbitrary point in a
the sum of the bids minus the total cost, divided by the to- curve that trades gains in user quality for gains in advertiser
tal cost, over all budget constrained campaigns. (From this value.
3
An implementation of LP that used more resources could 6.3.4 Summary
narrow the gap by increasing the fraction of queries covered
by the “head”. However, as the number of distinct queries For optimizing clicks-per-dollar, OT-Clicks dramatically
increases rapidly for each fraction of additional coverage, outperformed LP-Clicks by using all the data. For op-
every doubling of resources will only yield incremental gains. timizing profit-per-dollar, OT-Profit matched BidScal-

Figure 7: Impact on profit-per-dollar, over budget Figure 9: Multiple objectives: impact on Profit-per-
constrained campaigns. The baseline is VPT. dollar. The baseline is VPT.

Figure 8: Impact on CTR (including all campaigns). Figure 10: Multiple objectives: impact on CTR. The
The baseline is VPT. baseline is VPT.

ing on profit-per-dollar, while yielding better CTR (quality was significantly less than in the simulations, since we have
for users). If the primary goal was quality, OT-CTR blew complete knowledge of future queries in the simulations, un-
away the other algorithms, while still improving advertiser like the partial predictability of live traffic.
profit-per-dollar and clicks-per-dollar. Finally, blending mul- For OT-CTR, the experiments showed statistically signif-
tiple metrics can yield better results than a single metric, icant improvements in quality for users, clicks, and conver-
since different advertisers have more scope for optimization sions, while revenue was neutral – a Pareto improvement to
along different metrics. all objectives. Gains in conversions per dollar were signifi-
One might wonder whether implementation of such tech- cantly higher than the gains in clicks per dollar, since CTR
niques would incentivize budget unconstrained advertisers is correlated with conversion rate. Thus by shifting spend
to lower their budgets and become budget constrained. The to ads with higher CTR, we also increased the number of
answer is negative: BidScaling did slightly better than conversions.
OT-Profit from an advertiser’s perspective (ignoring qual-
ity for users), and BidScaling only used the information 7. CONCLUSION
available to the search engine, not the additional informa- We studied the problem of allocating budget constrained
tion available to the advertiser. With additional information spend in order to maximize objectives such as quality for
about conversion rates for each keyword, or the true value users, or ROI for advertisers. We introduced the concept of
of each click (rather than using the bid as the proxy for fair allocations (analogous to Nash equilibriums), and con-
value), the advertiser easily get better ROI by optimizing strained the space of algorithms to those that yielded fair
their campaign (versus becoming budget constrained). allocations. We were also constrained (in our setting) to not
modify bids. We proposed a family of Optimized Throttling
6.4 Live Traffic Experiments algorithms that work within these constraints, and can be
We implemented our algorithms in Google’s production used to optimize different objectives. In fact, they can be
ad serving system, and ran experiments on live traffic, with tuned to pick an arbitrary point in the tradeoff curve be-
both OT-CTR and BidScaling. The results were consis- tween multiple objectives.
tent with our simulations, though the magnitude of the gains Prior approaches such linear programming and bid scaling

are not applicable in our setting: linear programming yields [11] D. X. Charles, M. Chickering, N. R. Devanur, K. Jain,
unfair allocations, and bid scaling changes bids. It was nev- and M. Sanghi. Fast algorithms for finding matchings
ertheless interesting to study how much of a penalty (if any) in lopsided bipartite graphs with applications to
our algorithms pay for working within these constraints (fair display ads. In ACM Conference on Electronic
allocations, fixed bids). We found that, surprisingly, our al- Commerce, pages 121–128, 2010.
gorithms dramatically outperform linear programming – by [12] Y. Chen, P. Berkhin, B. Anderson, and N. Devanur.
being fast enough to use all the data rather than being lim- Real-time bidding algorithms for performance-based
ited to head queries. Our algorithms are also competitive display ad allocation. In KDD, pages 1307–1315.
with bid scaling on advertiser metrics, while yielding better ACM, 2011.
ad quality for users. [13] N. R. Devanur and T. P. Hayes. The adwords problem:
The Optimized Throttling algorithms are designed for im- online keyword matching with budgeted bidders under
plementation in a high throughput production system. The random permutations. In ACM Conference on
computation overhead at serving time is negligible: just a Electronic Commerce, pages 71–78, 2009.
few comparisons. The algorithms also have a minimal mem- [14] B. Edelman, M. Ostrovsky, and M. Schwarz. Internet
ory footprint, as little as 8 bytes (plus hash table overhead) Advertising and the Generalized Second-Price
per advertiser. Finally, they are robust with respect to errors Auction. American Economic Review, 97(1):242–259,
in estimating future traffic, since they only need the total 2007.
volume of traffic and the distribution of the chosen metric, [15] J. Feldman, M. Henzinger, N. Korula, V. S. Mirrokni,
not the number of occurrences of each query. We validated and C. Stein. Online stochastic packing applied to
our system design by implementing our algorithms in the display ad allocation. In ESA (1), pages 182–194,
Google ads serving system, and running experiments on live 2010.
traffic. The experiments showed significant improvements in
[16] J. Feldman, N. Korula, V. S. Mirrokni,
both advertiser ROI (conversions per dollar) and user expe-
S. Muthukrishnan, and M. P´l. Online ad assignment
a
rience.
with free disposal. In WINE, pages 374–385, 2009.
Acknowledgments: We thank Anshul Kothari for his con- [17] J. Feldman, S. Muthukrishnan, M. Pal, and C. Stein.
tributions to the algorithms and system design. Budget optimization in search-based advertising
auctions. In EC, 2007.
8. REFERENCES [18] S. Ghemawat, H. Gobioff, and S.-T. Leung. The
Google File System. In 19th ACM Symposium on
[1] Google automatic bidding product. Operating Systems Principles, 2003.
http://adwords.google.com/support/aw/bin/ [19] G. Goel and A. Mehta. Online budgeted matching in
answer.py?hl=en&answer=113234. random input models with applications to Adwords.
[2] Google conversion optimizer product. http: In SODA, 2008.
//www.google.com/adwords/conversionoptimizer/. [20] K. Hosanagar and V. Cherepanov. Optimal bidding in
[3] Protocol buffers. Website, 2008. stochastic budget constrained slot auctions. In EC,
http://code.google.com/p/protobuf. 2008.
[4] Z. Abrams. Revenue maximization when bidders have [21] M. Mahdian, H. Nazerzadeh, and A. Saberi.
budgets. In SODA, 2006. Allocating online advertisement space with unreliable
[5] Z. Abrams, S. Keerthi, O. Mendelevitch, and estimates. In EC, 2007.
J. Tomlin. Ad delivery with budgeted advertisers: a [22] A. Mehta, A. Saberi, U. V. Vazirani, and V. V.
comprehensive lp approach. J. Electronic Commerce Vazirani. Adwords and generalized online matching. J.
Research, 9(1), 2008. ACM, 54(5), 2007.
[6] G. Aggarwal, A. Goel, and R. Motwani. Truthful [23] R. Pike, S. Dorward, R. Griesemer, and S. Quinlan.
auctions for pricing search keywords. In EC, 2006. Interpreting the data: Parallel analysis with sawzall.
[7] C. Borgs, J. Chayes, N. Immorlica, K. Jain, Scientific Programming Journal, 13:277–298, 2005.
O. Etesami, and M. Mahdian. Dynamics of bid [24] P. Rusmevichientong and D. Williamson. An adaptive
optimization in online advertisement auctions. In algorithm for selecting profitable keywords for
Proc. of the 16th international conference on World search-based advertising services. In EC, 2006.
Wide Web, pages 531–540. ACM, 2007. [25] H. Varian. Position auctions. International Journal of
[8] C. Borgs, J. Chayes, N. Immorlica, M. Mahdian, and Industrial Organization, 25(6):1163–1178, 2007.
A. Saberi. Multi-unit auctions with [26] E. Vee, S. Vassilvitskii, and J. Shanmugasundaram.
budget-constrained bidders. In EC, pages 44–51, 2005. Optimal online assignment with forecasts. In ACM
[9] N. Buchbinder, K. Jain, and J. Naor. Online Conference on Electronic Commerce, pages 109–118,
Primal-Dual Algorithms for Maximizing Ad-Auctions 2010.
Revenue. In ESA, 2007.
[10] M. Cary, A. Das, B. Edelman, I. Giotis, K. Heimerl,
A. Karlin, C. Mathieu, and M. Schwarz. Greedy
bidding strategies for keyword auctions. In Proc. of
the 8th ACM conference on Electronic commerce,
pages 262–271. ACM New York, NY, USA, 2007.

Optimizing Budget Constrained Spend in Search Advertising

More Related Content

Viewers also liked

Similar to Optimizing Budget Constrained Spend in Search Advertising

More from Sunny Kr

Recently uploaded

Optimizing Budget Constrained Spend in Search Advertising