Optimizing Budget Constrained Spend in Search
                               Advertising

                                                     ∗
                     Chinmay Karande                                 Aranyak Mehta                 Ramakrishnan Srikant
                        chinmayk@fb.com                           aranyak@google.com                  srikant@google.com

                                                                   Google Research
                                                                Mountain View, CA, USA

ABSTRACT                                                                               Each advertiser also specifies a daily budget, which is an
Search engine ad auctions typically have a significant frac-                         upper bound on the amount of money they are prepared to
tion of advertisers who are budget constrained, i.e., if al-                        spend each day. While many advertisers use bids as the pri-
lowed to participate in every auction that they bid on, they                        mary knob to control their spend and never hit their budget,
would spend more than their budget. This yields an im-                              there exists a significant fraction of advertisers who would
portant problem: selecting the ad auctions in which these                           spend more than their budget if they participated in ev-
advertisers participate, in order to optimize different system                       ery auction that their keywords match. Search engines of-
objectives such as the return on investment for advertisers,                        ten provide an option to automatically scale the advertiser’s
and the quality of ads shown to users. We present a sys-                            bids [1, 2], but a substantial fraction of budget constrained
tem and algorithms for optimizing such budget constrained                           advertisers do not opt into these programs. For these ad-
spend. The system is designed be deployed in a large search                         vertisers, the search engine has to determine the subset of
engine, with hundreds of thousands of advertisers, millions of                      auctions the budget constrained advertiser should partici-
searches per hour, and with the query stream being only par-                        pate in. This creates a dependence between auctions on
tially predictable. We have validated the system design by                          different queries, and leads to essentially a matching or an
implementing it in the Google ads serving system and run-                           assignment problem of advertisers to auctions.
ning experiments on live traffic. We have also compared our                              In this paper, we consider the problem of optimized budget
algorithm to previous work that casts this problem as a large                       allocation: allocating advertisers to queries such that budget
linear programming problem limited to popular queries, and                          constraints are satisfied, while simultaneously optimizing a
show that our algorithms yield substantially better results.                        specified objective. Prior work in this area has often chosen
                                                                                    revenue as the objective to optimize. However, the long term
                                                                                    revenue of a search engine depends on providing good value
                                                                                    to users and to advertisers. If users see low quality ads,
Categories and Subject Descriptors                                                  then this can result in ad-blindness and a drop in revenue.
G.1.6 [Optimization]: Linear Programming; K.6.0 [General]:                          If advertisers see low return on investment (ROI), then they
Economics                                                                           will reduce their bids and budgets, again resulting in a drop
                                                                                    in revenue. Thus, we explore two other objectives in this
                                                                                    paper: improving quality, and advertiser ROI.
1. INTRODUCTION
  Search ad auctions have emerged as the primary model for                          Paper Outline We describe the problem of optimized bud-
monetizing the value provided by search engines. Advertis-                          get allocation in Section 2, followed by related work in Sec-
ers use phrases (keywords) to specify the set of queries they                       tion 3. We present our algorithms and system design in Sec-
are interested in, and bid the cost they are prepared to pay                        tion 4, and discuss certain key properties of the algorithm in
per click on their ad. For each search query, the set of ads                        Section 5. In Section 6, we show that our algorithms yield
to show, the order in which they are shown, and the cost                            substantially better results than prior work and also describe
per click for each shown ad are determined via an auction.                          the results of experiments on live traffic. We conclude with
                                                                                    some closing thoughts in Section 7.
∗The author is currently at Facebook, Inc., Menlo Park, CA,
USA. The work described in this paper was done while the
author was at Google.                                                               2. PROBLEM DEFINITION
                                                                                       Let A be the set of advertisers, and Q the set of queries.
Permission to make digital or hard copies of all or part of this work for
                                                                                    Each advertiser a ∈ A comes with a daily budget Ba . Let
personal or classroom use is granted without fee provided that copies are           G(A, Q, E) be a bipartite graph such that for a ∈ A and
not made or distributed for profit or commercial advantage and that copies           q ∈ Q, edge (a, q) ∈ E means that an ad of a is eligible for
bear this notice and the full citation on the first page. To copy otherwise, to      the auction for query q (a’s keywords match q). Let ctr(a,q)
republish, to post on servers or to redistribute to lists, requires prior specific   be the probability of a click on a’s ad for q, and bid(a,q) be
permission and/or a fee.
WSDM’13, February 4–8, 2013, Rome, Italy.
                                                                                    the amount a is willing to pay per click. (Note that ctr(a,q)
Copyright 2013 ACM 978-1-4503-1869-3/13/02 ...$15.00.                               is the position-independent CTR, i.e., the probability of a
click at some chosen fixed position. In other words, ctr(a,q)        or advertiser clicks per dollar. Thus their approach yields
does not depend on the position of the ad.)                         an optimal solution to the click maximization problem in
   When a query q arrives, the eligible ads a for q are ranked      the general GSP setting. However, there are two reasons
by bid(a,q) ctr(a,q) and shown in that order. Denoting the          why this work is not the final word on the allocation ap-
jth ad in the order as aj , the cost per click of aj is set as      proach. First, the LP can only run over head queries due
                                                                    to resource constraints, which brings up an interesting ques-
           cpc(aj ) = bid(aj+1 ,q) ctr(aj+1 ,q) /ctr(aj ,q)
                                                                    tion: Can a non-optimal algorithm that runs over the entire
This is known as the generalized second price (GSP) auction         query stream beat an optimal algorithm that is restricted to
(see, e.g., [25, 14, 6]).                                           the head? Second, the LP formulation can yield solutions
   Let Ta denote the spend of the advertiser a if a partici-        that are clearly unfair, in that the allocation for some adver-
pates in all the auctions for which a is eligible via the key-      tisers is very different from what the advertiser would choose
word match (ignoring a’s budget). If Ta > Ba , the advertiser       for themselves for that objective (see Section 5). Hence it
is budget constrained, and the search engine has to limit (or       is unclear whether LP solutions can be deployed by search
throttle) the set of auctions in which the advertiser partici-      engines.
pates.                                                                  A second stream of work focused on optimizing search en-
                                                                    gine revenue in a first price auction. Mehta et al. [22] and
Objectives. For budget constrained advertisers, the search
                                                                    Buchbinder et al. [9] both provide an online approximation
engine can select to optimize different objectives such as:
                                                                    algorithm for optimizing revenue, with a best possible ap-
     • the quality of the ads shown, e.g., maximize the position-   proximation guarantee of 1 − 1/e 63%, for the scenario in
       independent predicted click-through rate,                    which we do not know anything about the future distribu-
     • reduce advertiser cost-per-click (maximize the number        tion of queries. In contrast, we assume that we can predict
       of clicks), or                                               future distributions of queries, albeit noisily. Mahdian et
                                                                    al. [21] extended the algorithm from [22] to provide guaran-
     • increase advertiser ROI.                                     tees in the setting when we have unreliable estimates. [19,
   We do not have full knowledge of the graph G, but only           13] analyzed revenue maximization in an average case set-
information from past data, i.e., the graphs from previous          ting where queries arrive in a random order, or are picked
days. We also require practical algorithms which work on-           i.i.d from a distribution. All the above papers focus solely
line, i.e., given the next query, decide, in sub-second re-         on search engine revenue, assume a first price auction, and
sponse time, which ad impressions to show. We can now               do not consider multiple slots or positions. We focus on op-
define the problem as follows.                                       timizing very different objectives, such as user experience
                                                                    and advertiser ROI, in the second price GSP auction, with
The Optimized Budget Allocation Problem: Given
                                                                    multiple slots and positions.
information about the past as G (A, Q , E ) including the
                                                                        There has been considerable related work in budget allo-
bids and CTR for each advertiser-query pair, the budget
                                                                    cation for display ads, e.g., [11, 12, 15, 16, 26]. This work
Ba , for each advertiser a, and a specified objective function
                                                                    also does not consider second price auctions, or the objec-
to optimize: For each query arriving in an online stream of
                                                                    tives we study.
queries Q, decide which advertisers should participate in the
                                                                        There has also been work on designing new incentive com-
auction.
                                                                    patible auctions in the presence of bidders with budget con-
                                                                    straints, initiated by [4, 8]. Our goal is to optimize budgets
3.    RELATED WORK                                                  in the context of the GSP auction used by search engines,
   There have been two broad approaches to optimizing bud-          and hence we do not consider alternate auction designs.
get constrained spend: allocation and bid modification. Al-          Bid Modification The second approach [7, 10, 17, 20, 24]
location treats bids as fixed, and allows only decisions about       studies the advertiser’s problem of bidding optimally in or-
whether the advertiser should participate in the auction.           der to maximize ROI (or some notion of utility). Changing
This is our setting, where we are constrained to not change         bids in such a manner is an alternate solution to dealing with
bids, but only optimize allocations.                                budget constrained advertisers: if the advertiser permits, the
   The second approach, bid modification, is in a setting            search engine can scale bids down until the advertiser is no
where bids can be changed. This body of work typically              longer budget constrained. However, despite the availability
considers the problem from the advertiser’s perspective, and        of a free option to automatically scale bids [1], a substantial
assumes full knowledge of the information that advertisers          fraction of budget constrained advertisers have not opted in
typically have (such as the value of clicks or conversions).        to use this product. For these advertisers, the search engine
However, this work can also be adapted to be applicable             (as a policy decision) has to use the allocation approach,
from a search engine’s perspective, for advertisers who have        not bid modification. Hence the work on bid modification,
opted in and allow the search engine to change their bids.          while clearly an appealing alternative, is not applicable in
   We next describe the related work in the allocation and          our setting.
bid modification approaches.                                            Despite the lack of applicability, it would be interesting to
Allocation The paper by Abrams et al. [5] is the closest            understand how optimized budget allocation fares against
to our work. They solve the offline problem (with complete            bid scaling. We compare our algorithms to bid scaling in
knowledge of future query frequency) for head queries (the          Section 6.
most popular search queries) in the GSP auction setting us-
ing a linear program (LP), to optimize search engine revenue
4.     ALGORITHMS                                                     To achieve this goal, our algorithm uses a third input,
   Given the problem of optimizing budget constrained spend,        the rank Rθ,a of an impression for a given advertiser a and
the first step is to neither over- nor under-spend.1 The naive       metric θ. Define
way to do this is to let advertisers participate in auctions un-       • Fθ,a (µ): Estimated fraction of maximum spend Ta for
til they hit their budget, and then make them ineligible for             which θ(i) ≤ µ. In other words, F is the estimated
the rest of the day. Clearly, this will yield very biased traffic          cumulative distribution function of θ.
to the advertiser, and also skew auction competition towards
the earlier part of the day. The next simplest approach,               • Rθ,a (µ) = 1 - Fθ,a (µ).
which does not have these drawbacks, is Vanilla Probabilis-         The lower the value of the rank R, the better the impression
tic Throttling (VPT).                                               scores on our metric.
   For each advertiser a, define:                                      We now define the Optimized Throttling algorithm:
     • Ba : the remaining budget for the day (or time period).
                                                                            For each arriving query q:
     • Ta : the remaining maximum spend for the rest of the
                                                                              For each budget constrained advertiser a:
       day, i.e., the total spend if the advertiser had unlimited
                                                                                 If Rθ,a (θ(i)) ≤ Ba /Ta , then a
       budget.
                                                                                 participates in the auction.
We now define the Vanilla Probabilistic Throttling al-
gorithm:                                                                            Figure 2: Algorithm OT

          For each arriving query q:                                  While the algorithm appears to be a straightforward greedy
            For each budget constrained advertiser a:               algorithm, there are several subtleties:
               Flip a coin with P [Heads] = Ba /Ta .
                                                                       • The algorithm yields a solution that is “fair” to each
               If heads, a participates in the auction.
                                                                         advertiser (formally defined, and proved in Section 5).

    Figure 1: Vanilla Probabilistic Throttling (VPT)                   • The algorithm yields an optimal fair solution under
                                                                         certain constraints. While we can find many examples
  If our estimate of Ta is accurate, then each advertiser                where the constraints don’t hold, the constraints hold
spends very close to her budget in expectation (and with                 often enough that the algorithm is not too far from
high probability, by Chernoff bounds). Advertisers also re-               optimal in practice.
ceive a representative sample of the traffic they are eligible           • We have transformed the domain from the space of
for.                                                                     queries to the distribution of some property of queries.
                                                                         Predicting the frequency of queries in the tail is in-
4.1     Optimized Throttling                                             tractable. Predicting the distribution of specific prop-
  We now present algorithms for optimizing one or more of                erties of the queries in the tail is very tractable (for
the following objectives:                                                properties we use in our algorithms).
     • average quality of ads shown, represented by CTR.               • The choice of metric lets us optimize a wide range
     • clicks per dollar,                                                of objectives, or combinations of objectives (discussed
                                                                         next).
     • conversions per dollar,
                                                                      We now define five instantiations of OT, corresponding
     • the advertiser’s profit, using the difference between the      to the four objectives we listed earlier, and a fifth objective
       bid and the cost per click as the estimate of profit.         that combines quality and clicks:
   For the chosen objective, given a candidate ad impression i
(i.e., for a specific query and advertiser), we compute a met-                              Objective      θ(i)
ric θ(i) which tracks the desired objective. Given a choice          OT-CTR                Ad quality     ctri
between two impressions (for the same advertiser), we would          OT-Clicks             Clicks         1/cpci
prefer to show the impression with the higher value of the           OT-Profit             Profit          (bidi − cpci )/cpci
metric. For example, if the objective is quality, the metric         OT-Conversions        Conversions    cvri vali /cpci
could be the position-independent predicted click-through                                                 ctri (bidi − cpci )
                                                                     OT-CTR-Profit         Blend
rate of the advertiser for this query, reflecting our desire to                                                   cpci
show higher-quality impressions. Conceptually, we would
like to rank all the impressions for an advertiser by the de-                   Figure 3: Instantiations of OT
sired metric, and choose the impressions with the highest
metric score until the budget is filled.                                The first metric, ctri is straightforward: we are using
1
                                                                    position-independent predicted CTR as a proxy for qual-
 While most advertisers are clearly budget constrained or           ity. (Of course, one could use any arbitrary quality metric
unconstrained, advertisers at the margin may switch back-
                                                                    instead of CTR.)
and-forth between the two states, based on traffic. It is
straightforward to handle these marginal cases. For ease of            To understand the next three metrics, it is helpful to
exposition, we ignore this issue in the paper; however our          multiply the numerator and denominator by the position-
implementation does handle these cases gracefully.                  dependent predicted CTR. For example, in OT-Clicks, the
numerator now becomes the expected number of clicks, and          (purely for ease of exposition). Since a search engine has a
the denominator the expected cost, and the ratio is the clicks    large number of advertisers, we would like to compress the
per dollar. Since the total spend is fixed for budget con-         information in R at serving time. We compress R into a
strained advertisers, optimizing clicks is equivalent to opti-    histogram H.
mizing clicks per dollar. 2                                          Recall that R is only used to answer the question of whether
   For OT-Profit, assuming that bidi is the value to the          R(θ(i)) is less than ip (= Ba /Ta ). So if the estimate of ip
advertiser, bidi −cpci is the expected profit to the advertiser    was very stable, we need just two buckets in the histogram
if there is a click on this impression. So the metric for OT-     H, with the boundary at the value c∗ such that R(c∗ ) = ip.
Profit is the expected profit per dollar of spend.                 With two buckets, we would need just 8 bytes of data per
   For OT-Conversions the metric is the expected conver-          budget constrained advertiser: the value c∗ and the value of
sion value per dollar, given the value of a conversion on this    R(c∗ ).
impression vali , and a model that predicts the conversion           We may choose to create additional buckets around the
rate cvri . Building machine learning models for estimat-         threshold c∗ , based on the tradeoff between increased gains
ing cvri is beyond the scope of the paper. However, we            from choosing the highest scoring impressions (see below)
will note the existence of commercial systems that estimate       versus memory constraints. For each bucket m in H ex-
cvri given advertiser-specific conversion data, e.g., [2]. Ad-     cluding the last bucket, we store the bucket boundary and
vertisers do not necessarily need to provide conversion data      the value of R at the bucket boundary. We refer to these
in order to benefit from the techniques in the paper. One          histograms as throttling parameters.
can build models for estimating conversion rate that are not      Using the estimates at serving: When a query arrives,
advertiser-specific, e.g., we found that ctri is correlated with   we need to determine, for each budget constrained advertiser
cvri .                                                            a, whether a participates in the auction. The input consists
   The final metric simply multiplies the metric for CTR           of the histogram H, the current value of ip, and the value
and profit. The intuition is that for advertisers with a lot of    of θ(i) for the current impression i. Let θ(i) be in bucket
variance on CTR but not much on profit, the algorithm will         m. Let Hb (m) denote the upper boundary of bucket m, and
focus on CTR. Similarly for advertisers with more variance        let Hr (m) = R(Hb (m)). Then the advertiser participates in
on profit than CTR, the algorithm will focus on profit. Thus        the auction with the following probability:
if the search engine cares about both CTR and advertiser
                                                                                                  if Hr (m) ≤ ip
                                                                            
profit, the blended metric will likely yield better results than              1,
simply averaging the results of the individual metrics.                        0,                 if Hr (m − 1) ≥ ip
                                                                                  ip−Hr (m−1)
                                                                                               , otherwise
                                                                            
                                                                               Hr (m)−Hr (m−1)
4.2    System design and implementation
                                                                  The first two cases are straightforward, and follow directly
  We next describe our implementation of the algorithm in
                                                                  from the goal that we (do not) show the impression if it is
the production Google ads serving system. Our system has
                                                                  (not) in the top ip fraction of spend. In the third case, the
three primary components:
                                                                  probability that the impression is in the top ip fraction of
    • estimating Ba /Ta ,                                         spend is given by (ip − Hr (m − 1))/(Hr (m) − Hr (m − 1)),
                                                                  and hence we show the impression with that probability.
    • estimating and compressing Rθ,a , and
                                                                  Implementation: We have built our data collection pipeline
    • using the estimates at serving time.                        on top of Google’s sawzall [23] infrastructure, which allows
We will use OT-CTR to illustrate the techniques used, and         us to process historical query data in parallel. The throt-
discuss any differences between OT-CTR and the other in-           tling parameters generated by the data collection pipeline is
stantiations as they come up.                                     written to the Google File System (GFS) [18]. The data is
                                                                  stored in the protocol buffer format [3], which reduces the
Estimating Ba /Ta : This component estimates the impres-          storage requirement as well as make the transfer and pro-
sion probability ip, where ip is defined as Ba /Ta . In other      cessing of the data more efficient. From GFS, the throttling
words, ip is the probability with which we should allow an        parameters data is picked up by the ads data push system,
impression of the advertiser to participate in an auction,        which writes it to one of its data channels. The ads serving
in order to show her impressions uniformly through the re-        system gets the updated throttling parameters through this
mainder of the day, and exhaust her budget at the end of the      channel.
day. We estimate ip using traffic information from the past,
and using the available budget. Given the inherent noise in       Interactions between budget constrained advertis-
traffic, a feedback loop is used at frequent intervals to adjust    ers: Out of the metrics in Figure 3, ctri , bidi , cvri and
the probability. Clearly, the more accurate the estimate of       vali are all functions purely of the impression, and do not
ip, the more the gains from optimization.                         change based on the other ads in the auction. However,
                                                                  cpci is a function of the runner-up, since we use a second-
Estimating and compressing Rθ,a : We use historical               price auction. We analyzed the logs, and found that budget
data to compute the cumulative distribution function Fθ,a .       constrained ads are more likely to be next to budget uncon-
Rθ,a is a trivial transformation of Fθ,a . In the rest of this    strained ads than budget unconstrained ads. However, we
section, we will drop the subscripts and refer to Rθ,a as R       will have instances with consecutive budget constrained ads.
2
 Some metrics like ctri are purely a function of the impres-      We use an iterative technique for improving the performance
sion. However, metrics like cpc may change based on the           of the online algorithms in these cases.
other ads in the auction. We discuss this issue in Section 4.2.      The intuition behind the iterative technique is to run sev-
eral (simulated) iterations of the auction. In each iteration,
we compute the metric θ for each budget constrained adver-                • Rank the impressions i ∈ Ea in order of decreas-
tiser (including those that do not participate in the auction               ing 1/cpci .
in the current iteration), and based on the value of the met-             • Pick the top impressions in Ea according to this
ric, decide whether that advertiser participates in the next                ranking until the budget runs out, i.e. the largest
iteration. Note that an impression may be removed in one                    prefix Ia , s.t.   i∈Ia spendi ≤ Ba , and at most
iteration and re-enter in a subsequent iteration. While we                  one additional fractional impression to finish the
cannot prove convergence, in practice this often converges                  budget.
in a few rounds, or at least leads to an improved solution
over simply using the value of θ from the first iteration.
                                                                         Figure 4: Offline-OT-Clicks-Single-Advertiser
5. KEY PROPERTIES OF ALGORITHM
   The overall effectiveness of the algorithm depends on two           Without fractional impressions, this is the integral knap-
factors: the accuracy of the prediction of future traffic (total        sack problem. Since one click is a tiny fraction of adver-
traffic and the distribution of the metric), and the intrinsic          tiser spend, we allow the choice of one fractional impres-
effectiveness of the algorithm. To understand the latter, we           sion, thereby converting the problem to a fractional knap-
analyze the algorithm on the offline version of the problem,            sack problem. Observing that spendi = αi ctri cpci and
in which the advertiser-query graph is known. We will focus           therefore αi ctri /spendi = 1/cpci , we get the algorithm in
on the instantiations with a linear objective: OT-Clicks,             Figure 4, which is a simple greedy algorithm, using the ratio
OT-Profit and OT-Conversions. For non-linear objec-                   of the expected value from the click to the cost of the click.
tives such as CTR, optimality with even a single budget con-          Theorem 1 follows from the well known optimality of the
strained advertiser may require allocation in a manner that           greedy strategy for the fractional knapsack problem, and its
is clearly against the advertiser’s interest. So algorithms like      proof is omitted here.
OT-CTR which are designed to both be good for advertisers
and improve quality cannot be optimal.                                   Theorem 1. Offline-OT-Clicks-Single-Advertiser com-
   We first consider the special case of a single budget con-          putes an optimal solution to the ROI maximization problem
strained advertiser, and show that our algorithm is optimal           for a single budget constrained advertiser.
for linear objectives. We then discuss the issue of fairness
when there are multiple budget constrained advertisers, and           5.2 Fair Allocations
show by example that linear programming can yield solu-                 We begin with the following definition of an optimal allo-
tions that are optimal but not fair. When we restrict the             cation:
space of solutions to fair solutions, we show that our algo-
                                                                        Definition 1. We call an allocation optimal if it maxi-
rithm yields an optimal solution even with multiple budget
                                                                      mizes
constrained advertisers, as long as there are no adjacent bud-
get constrained advertisers in a given auction. Obviously,                              a∈A wa    i∈Ia αi ctri
we do get adjacent budget constrained advertisers in the                                       a∈A wa
real world – but the optimality result with that constraint
                                                                      over all possible allocations (given the wa ≥ 0, which are
suggests that the algorithm will perform well in practice.
                                                                      arbitrary advertiser specific weights).
   For ease of exposition, we will choose the simplest instan-
tiation, OT-Clicks as the representative algorithm in the             However, this definition has the problem that in trying to
proofs. It is straightforward to sketch out similar proofs for        maximize the weighted average of advertiser ROI, we may
OT-Profit and OT-Conversions.                                         end up sacrificing the interests of some advertisers, as the
                                                                      following example illustrates.
5.1    Optimal For A Single Budget Constrained
       Advertiser                                                        Example 1. There are two budget constrained bidders,
   We start by proving that given G(A, Q, E) and a single             a and b, each with a budget of $100. There are two different
budget constrained advertiser a ∈ A, OT-Clicks maximizes              queries q1 and q2 , each with 100 instances. q1 has a mini-
clicks per dollar for a. While the proof is obvious, it is useful     mum reserve cpc of $1, and q2 has a reserve of $2 (one may
as a building block to more interesting results.                      replace the reserves by an unconstrained bidder, keeping the
   Given an advertiser a, we define:                                   example unchanged). The bidders bid the following values
    • Ea to be the set of impressions in queries where a is           for the queries (a does not bid on q2 ):
      eligible to participate in the auction, and
   • Ia ⊆ Ea to be the set of impressions where a partici-                                          q1   q2
      pates in the auction.                                                                   a     20   −
The total expected clicks is   i∈Ia αi ctri , where αi is the                                 b     10   10
position normalizer. For fixed budget, maximizing clicks                                      min     1    2
per dollar is the same as maximizing total clicks, which is
captured by the following linear program: Given a, find                  For ease of exposition, we assume that the CTR is equal
         Max            αi ctri ,   s.t.          spendi ≤ Ba   (1)   for all advertisers and query pairs, in both positions. To
        Ia ⊆Ea                                                        maximize total clicks, or equivalently, clicks per dollar, the
                 i∈Ia                      i∈Ia
optimal solution is to let a participate in q1 , and b in q2 .
Then a gets 100 clicks and b gets 50, giving a total of 150           1. Begin with an allocation with only budget uncon-
clicks at a cost-per-click of $1.33. But this solution is not            strained advertisers.
fair to b, who would rather show for q1 and get a cpc of $1           2. For each budget constrained advertiser a ∈ A (in
and hence 100 clicks instead of 50. In this scenario a would             turn):
get only 10 clicks at a cpc of $10, giving a total of 110 clicks         –Run Offline-OT-Clicks-Single-Advertiser for a.
at an average cpc of $1.81.                                   2          –Update I.

  Example 1 motivates the following definition:                        3. If the allocation has not converged, go to step 2.

   Definition 2. We define an allocation I ⊆ E to be a fair
allocation for the clicks objective, if ∀ a ∈ A:                          Figure 5: Algorithm Offline-OT-Clicks.

  a. The expected spend of a is equal to Ba (a exhausts its
     budget), or Ia = Ea (we show every impression of a it         due to the insertion or removal of other budget constrained
     is eligible for).                                             impressions). As an aside, this key property also holds for
                                                                   first price auctions, and the following results also hold for
  b. Given the allocation of other advertisers (i.e. IIa ), Ia
                                                                   first price auctions.
     maximizes the total expected clicks that a can obtain
                                                                      Given this property, for each advertiser a, the 1 / cpc or-
     within its budget.
                                                                   dering of the impressions in Ea is fixed throughout the al-
We call an allocation I an optimal fair allocation if it is        gorithm, independent of the allocations of the other adver-
a fair allocation, and it maximizes                                tisers. From the definition, it is easy to see that every fair
                                                                   allocation has the property that for each a ∈ A, Ia is a prefix
                      a∈A   wa     i∈Ia     αi ctri
                                                                   of this fixed ordering of Ea , since a non-prefix would violate
                                 a∈A   wa                          Property (b) in Definition 2. By definition of the algorithm,
among all fair allocation (where the wa ≥ 0 are arbitrary          this is true also for the allocations produced during every
advertiser specific weights).                                       step of the algorithm. We call allocations with this property
                                                                   prefix-allocations.
   We can similarly define fair allocations and optimal fair
allocations for the profit objective and the conversions ob-           Lemma 2. Algorithm Offline-OT-Clicks converges to a
jective.                                                           fair allocation if there are no adjacent constrained bidders
   In Example 1, the allocation which maximizes the sum            in any query.
(giving 150 clicks) is not a fair allocation, while allocating         Proof. Fix an advertiser a, and consider the allocation
both a and b to q1 is (even though this reduces the total          to a in each round. We claim that the prefix chosen for
clicks to 110). We note that our definition of fair allocation      a only increases in length in subsequent rounds. Since for
is analogous to that of a Nash equilibrium in games.               each advertiser the prefix cannot increase indefinitely, this
                                                                   means that the algorithm converges to some allocation, say
5.3    Optimal Fair Allocation When No Adja-                       I ∗ . Property (a) in Definition 2 holds for I ∗ because of the
       cent Budget Constrained Advertisers                         termination condition of Offline-OT-Clicks. Property (b)
  When there are multiple budget constrained advertisers,          holds by definition of Offline-OT-Clicks-Single-Advertiser,
a natural local algorithm is to cycle over the different ad-        which is run in every round of Offline-OT-Clicks.
vertisers until convergence, running Algorithm Offline-OT-               It remains to prove the claim that the prefix for a can only
Clicks-Single-Advertiser for each advertiser a. In the re-         increase in each round. Suppose this is true up to the time
mainder of this section we analyze this algorithm, which is        we process advertiser a in round k. In between the times
defined in Figure 5.                                                we process a in rounds k and k + 1, the algorithm may have
  In a GSP auction, there are two ways in which the intro-         introduced impressions of other advertisers in the auctions
duction or removal of an impression i of one budget con-           for which we show a’s impressions in round k. The only
strained advertiser can effect an impression i of another:          effect this can have on a’s impression is to possibly lower
                                                                   its position, and therefore of its expected spend. Thus, in
   • i can change the cpc of i                                     round k + 1, the algorithm may need to pick a longer prefix
                                                                   to finish a’s budget.
   • i can change the position of i , and hence the expected
     spend from i .                                                For two prefix-allocations I, J, we say I       J if for every
                                                                   advertiser a ∈ A, the prefix length in I is at most the prefix
Due to this interaction, it is unclear whether Algorithm
                                                                   length in J, and therefore Ia ⊆ Ja .
Offline-OT-Clicks always yields an optimal fair allocation.
However we will show that it does converge to a optimal fair         Lemma 3. Let I ∗ be the fair allocation that Algorithm
allocation when there are no adjacent budget constrained           Offline-OT-Clicks converges to (when there are no adja-
bidders, i.e., in the auction ranking of all eligible bidders,     cent constrained bidders). Then
there are no consecutive budget constrained bidders. The
                                                                                 I∗    I, for all fair allocations I
key property is that in such a scenario, the cpc of a budget
constrained impression is independent of other budget con-           Proof. If this is not true for some fair allocation I, then
strained impressions (even though the position may change          consider the first time during the run of the algorithm that
some advertiser a’s prefix becomes longer than its prefix
in I. Comparing to I, the algorithm’s current allocation
                                                                                            max             αsi ctrqsi xqs
has all advertisers a = a with smaller or equal prefixes.
                                                                                                    q,s,i
Thus the position normalizers of a’s impressions are larger
or equal during this step of the algorithm than in the al-         Allocation constraint:                              xqs   ≤ Nq   ∀q
location I. This implies that the prefix of a in the current                                                        s

allocation should be shorter or equal than that in I, since         Budget constraint:            cpcqsi αsi ctrqsi xqs      ≤ Bi   ∀i
the expected spend in each auction is at least that in I, a                                 q,s
contradiction.
                                                                     One can optimize for other linear metrics, such as rev-
Note that Lemma 3 implies that there is a unique -minimal          enue by suitably changing the objective function. It is not
fair allocation, and that the algorithm converges to it.           possible to directly optimize for non-linear metrics such as
                                                                   CTR.
   Theorem 4. Algorithm Offline-OT-Clicks converges to
                                                                     The advantage of LP (compared to our approach) is that
an optimal fair allocation if there are no adjacent constrained
                                                                   the LP fully incorporates interactions between different bud-
bidders in any query.
                                                                   get constrained advertisers. However, even the most efficient
   Proof. From Lemma 2 we know that the algorithm con-             LP solvers cannot solve linear programs on the volume of
verges to a fair allocation I ∗ . From Lemma 3 we get that         data a search engine sees in a day. Thus we have to limit
for every advertiser a, and every fair allocation I, a’s pre-      LP to the head portion of query traffic, and use an approach
fix in I ∗ is no longer than its prefix in I. Thus a spends          such as Vanilla Probabilistic Throttling on the long tail.
the same amount of money (in expectation) in I ∗ as in I,
but spends it on a subset (I ∗ )a ⊆ Ia such that impressions       6.1.2 Bid Scaling
in Ia (I ∗ )a have lower (or equal) ratios of value of click to      Both the OT algorithms and the LP formulation work un-
cost of click, as impressions in (I ∗ )a . This implies that a     der the constraint that the bidder’s inputs (bids) can not be
gets at least as much total expected value in I ∗ as in I, in      changed by the search engine, but they can only be throt-
turn implying that I ∗ maximizes any weighted average (over        tled. Bid Scaling algorithms go outside this design space by
advertisers) of the total expected value obtained, among all       lowering the bids of the constrained bidders appropriately.
fair allocations.                                                  As we discussed in Section 1, bid scaling is not an option for
                                                                   our problem. Nevertheless, it is interesting to compare our
  As mentioned earlier, it is easy to prove similar statements
                                                                   approach against bid scaling.
about other linear objectives such as optimizing profit and
                                                                      Our bid scaling algorithm finds one bid multiplier per bid-
optimizing conversions.
                                                                   der and applies it to all the advertiser’s bids, similar to the
                                                                   Budget Optimizer product in Google Adwords [1]. The mul-
6.    EMPIRICAL EVALUATION                                         tiplier is calculated so as to spend exactly the bidder’s bud-
   We report two types of experiments below, offline simu-           get.
lations to compare different optimized allocation solutions,
and live experiments on Google traffic. We start with a              6.2 Simulation Methodology
description of prior approaches: LP and BidScaling.                   We conducted simulations using a 20% sample of all US
                                                                   queries made over a week to the Google search engine. We
6.1     LP and BidScaling                                          sorted queries by the total impressions for that query (summed
                                                                   over all query instances), and picked the queries with the
6.1.1    Linear Programming                                        most impressions as the “head” queries for the LP. The num-
   Assuming complete knowledge of the data, a theoretical          ber of queries in the head was chosen so that the LP could
benchmark for any budget allocation algorithm can be ob-           run in memory. (Note that as we move from the head to the
tained via linear programming [5]. For any query q, let s          tail, each query has relatively few instances. So even dou-
denote the set of bidders chosen to participate in the auction     bling the memory will not substantially increase the fraction
by an allocation mechanism. Let C(q) be the collection of all      of revenue or clicks covered by the “head” queries.) For each
such sets, generated by any conceivable algorithm. Clearly,        set (head and tail), we computed an appropriate budget for
we can completely specify an allocation policy by specifying       that set by scaling down the total budgets. For each query
for each query q, the set s ∈ C(q) of bidders permitted by         we also have the candidate ads together with the relevant
the algorithm to participate in the auction for q.                 metrics, namely, the bid and the predicted CTR. Our sim-
   We can then attempt to discover the best allocation policy      ulation is offline, i.e., the set of queries and candidates is
(for a given linear objective, say click maximization) using       fixed. Since all the algorithms we simulate are time inde-
linear programming. Let Nq be the number of times query q          pendent (as opposed to some of the bid scaling algorithms
appears in the data and xqs be the number of times the set         studied earlier, e.g., [22, 9, 13, 16, 15]), we do not need to
s of bidders is selected for query q. For a bidder i ∈ s, let      worry about the time-arrival order of the queries.
cpcqsi and ctrqsi be the cost-per-click and click-through rate        We simulated the following throttling algorithms for our
for i in the auction for q. Let αsi be the position normalizer     first set of comparisons:
for impression i. Let Bi be the budget of bidder i. Then,
                                                                     1. VPT: Vanilla Probabilistic Throttling.
we can discover the best allocation policy that maximizes
total clicks provided to budget constrained advertisers as           2. OT-Clicks: Optimized Throttling, objective is clicks
the solution of the following linear program:                           (or inverse of cpc).
3. OT-CTR: Optimized Throttling, objective is CTR.
  4. BidScaling, as described in Section 6.1.2
  5. LP-Clicks: Click maximizing Linear Program on head
     queries.

6.3     Results
   We show the changes in various objectives relative to the
baseline of Vanilla Probabilistic Throttling (VPT). It is im-
portant to note that while we expect the overall conclusions
to carry over to an online setting where the query distribu-
tion changes over time, the exact numbers will change. In
general, the gains from optimized budget allocation or bid
scaling will be significantly lower in live experiments due to
changes in query traffic. For this reason, as well as data           Figure 6: Impact on clicks-per-dollar, over budget
confidentiality, we omit the scale from our graphs below.           constrained campaigns. The baseline is VPT.
6.3.1    Comparison with LP
   Figure 6 shows the change in clicks per dollar for budget       point, all figures represent performance using all queries,
constrained advertisers for each of the algorithms. The first       not just the head.) Here OT-Clicks does much worse than
set of numbers, “head”, show the results when we artificially       BidScaling, though it is still positive. In fact, even OT-
restrict all the algorithms to operate over the same set of        CTR beats OT-Clicks. OT-Profit, which attempts to
head queries as LP-Clicks, with VPT on the tail. Since             optimize profit, yields similar results to BidScaling.
LP-Clicks is not just optimal, but can also generate solu-            As discussed in the introduction, user experience (qual-
tions that are not fair (unlike the other algorithms), it is not   ity of ads) is as important as advertiser ROI for long-term
surprising that LP-Clicks outperforms the alternatives.            success. At first glance, one might expect that optimizing
   However, when we allow the algorithms to optimize over          clicks per dollar would yield similar results to optimizing
the entire dataset – the “all” numbers – the algorithms that       CTR: doesn’t a higher click-through rate mean more clicks?
can use the full data dramatically outperform LP-Clicks.           However, what matters for optimizing clicks per dollar (with
In fact, even OT-CTR, which is optimizing CTR and not              a fixed budget) is the cost per click, not the click-through
CPC, yields a higher drop in CPC (or equivalently, more            rate.
clicks per dollar) than LP-Clicks. The reason for the poor            Figure 8 shows the change in CTR for each of the al-
performance of LP-Clicks is that the LP can be run only            gorithms. OT-CTR dramatically outperforms all other al-
on the head, and even though the head queries account for          gorithms, not surprising since it is the only algorithm ex-
a substantial portion of revenue, they are relatively homo-        plicitly trying to optimize quality. Interestingly, while both
geneous – the potential gains from optimization are more           OT-Profit and BidScaling gave similar gains in profit-
in the tail than the head. We found that this held for the         per-dollar, their effect on quality is quite different: OT-
other metrics as well, i.e., the substantial majority of the       Profit improves CTR, while BidScaling reduces CTR.
gains from optimization came from the tail queries.3
                                                                   6.3.3    Multiple Objectives
6.3.2    Comparison with BidScaling                                  We now present results with metrics that blend the CTR
   The other interesting comparison in Figure 6 is between         and profit objectives. In Section 4 we had conjectured that
OT-Clicks and BidScaling. BidScaling performs slightly             blended metrics might yield better results than individual
better than OT-Clicks when restricted to head queries, as          metrics, since different advertisers may have better scope
many advertisers may appear for a relatively small number          for optimization along different dimensions. Figures 9 and
of queries in the head. Thus OT-Clicks, which doesn’t have         10 show the impact of two blended metrics: ctr(bidi −
the flexibility to scale bids, has a bit less room to maneu-        cpci )/cpc, and ctr2 (bidi − cpci )/cpc. Notice that OT-
ver. Over all queries, OT-Clicks has much more scope to            CTR-Profit, which uses the former as the metric, almost
differentiate between queries, and hence does slightly better       matches OT-Profit on profit-per-dollar, while yielding sig-
than BidScaling.                                                   nificantly higher gains in CTR than OT-Profit. OT-CTR2-
   However, OT-Clicks may be getting the gains by drop-            Profit further increases CTR gains, for a bit more drop in
ping high bid, high cpc clicks which might still yield more        profit-per-dollar. In addition to validating our conjecture
profit for the advertiser than low bid, low cpc clicks. Figure 7    that blended metrics may yield better results, such blended
shows how the algorithms do on estimated profit-per-dollar:         metrics let the search engine pick any arbitrary point in a
the sum of the bids minus the total cost, divided by the to-       curve that trades gains in user quality for gains in advertiser
tal cost, over all budget constrained campaigns. (From this        value.
3
  An implementation of LP that used more resources could           6.3.4    Summary
narrow the gap by increasing the fraction of queries covered
by the “head”. However, as the number of distinct queries            For optimizing clicks-per-dollar, OT-Clicks dramatically
increases rapidly for each fraction of additional coverage,        outperformed LP-Clicks by using all the data. For op-
every doubling of resources will only yield incremental gains.     timizing profit-per-dollar, OT-Profit matched BidScal-
Figure 7: Impact on profit-per-dollar, over budget                Figure 9: Multiple objectives: impact on Profit-per-
constrained campaigns. The baseline is VPT.                      dollar. The baseline is VPT.




Figure 8: Impact on CTR (including all campaigns).               Figure 10: Multiple objectives: impact on CTR. The
The baseline is VPT.                                             baseline is VPT.


ing on profit-per-dollar, while yielding better CTR (quality      was significantly less than in the simulations, since we have
for users). If the primary goal was quality, OT-CTR blew         complete knowledge of future queries in the simulations, un-
away the other algorithms, while still improving advertiser      like the partial predictability of live traffic.
profit-per-dollar and clicks-per-dollar. Finally, blending mul-      For OT-CTR, the experiments showed statistically signif-
tiple metrics can yield better results than a single metric,     icant improvements in quality for users, clicks, and conver-
since different advertisers have more scope for optimization      sions, while revenue was neutral – a Pareto improvement to
along different metrics.                                          all objectives. Gains in conversions per dollar were signifi-
   One might wonder whether implementation of such tech-         cantly higher than the gains in clicks per dollar, since CTR
niques would incentivize budget unconstrained advertisers        is correlated with conversion rate. Thus by shifting spend
to lower their budgets and become budget constrained. The        to ads with higher CTR, we also increased the number of
answer is negative: BidScaling did slightly better than          conversions.
OT-Profit from an advertiser’s perspective (ignoring qual-
ity for users), and BidScaling only used the information         7. CONCLUSION
available to the search engine, not the additional informa-         We studied the problem of allocating budget constrained
tion available to the advertiser. With additional information    spend in order to maximize objectives such as quality for
about conversion rates for each keyword, or the true value       users, or ROI for advertisers. We introduced the concept of
of each click (rather than using the bid as the proxy for        fair allocations (analogous to Nash equilibriums), and con-
value), the advertiser easily get better ROI by optimizing       strained the space of algorithms to those that yielded fair
their campaign (versus becoming budget constrained).             allocations. We were also constrained (in our setting) to not
                                                                 modify bids. We proposed a family of Optimized Throttling
6.4   Live Traffic Experiments                                    algorithms that work within these constraints, and can be
  We implemented our algorithms in Google’s production           used to optimize different objectives. In fact, they can be
ad serving system, and ran experiments on live traffic, with       tuned to pick an arbitrary point in the tradeoff curve be-
both OT-CTR and BidScaling. The results were consis-             tween multiple objectives.
tent with our simulations, though the magnitude of the gains        Prior approaches such linear programming and bid scaling
are not applicable in our setting: linear programming yields      [11] D. X. Charles, M. Chickering, N. R. Devanur, K. Jain,
unfair allocations, and bid scaling changes bids. It was nev-          and M. Sanghi. Fast algorithms for finding matchings
ertheless interesting to study how much of a penalty (if any)          in lopsided bipartite graphs with applications to
our algorithms pay for working within these constraints (fair          display ads. In ACM Conference on Electronic
allocations, fixed bids). We found that, surprisingly, our al-          Commerce, pages 121–128, 2010.
gorithms dramatically outperform linear programming – by          [12] Y. Chen, P. Berkhin, B. Anderson, and N. Devanur.
being fast enough to use all the data rather than being lim-           Real-time bidding algorithms for performance-based
ited to head queries. Our algorithms are also competitive              display ad allocation. In KDD, pages 1307–1315.
with bid scaling on advertiser metrics, while yielding better          ACM, 2011.
ad quality for users.                                             [13] N. R. Devanur and T. P. Hayes. The adwords problem:
   The Optimized Throttling algorithms are designed for im-            online keyword matching with budgeted bidders under
plementation in a high throughput production system. The               random permutations. In ACM Conference on
computation overhead at serving time is negligible: just a             Electronic Commerce, pages 71–78, 2009.
few comparisons. The algorithms also have a minimal mem-          [14] B. Edelman, M. Ostrovsky, and M. Schwarz. Internet
ory footprint, as little as 8 bytes (plus hash table overhead)         Advertising and the Generalized Second-Price
per advertiser. Finally, they are robust with respect to errors        Auction. American Economic Review, 97(1):242–259,
in estimating future traffic, since they only need the total             2007.
volume of traffic and the distribution of the chosen metric,        [15] J. Feldman, M. Henzinger, N. Korula, V. S. Mirrokni,
not the number of occurrences of each query. We validated              and C. Stein. Online stochastic packing applied to
our system design by implementing our algorithms in the                display ad allocation. In ESA (1), pages 182–194,
Google ads serving system, and running experiments on live             2010.
traffic. The experiments showed significant improvements in
                                                                  [16] J. Feldman, N. Korula, V. S. Mirrokni,
both advertiser ROI (conversions per dollar) and user expe-
                                                                       S. Muthukrishnan, and M. P´l. Online ad assignment
                                                                                                      a
rience.
                                                                       with free disposal. In WINE, pages 374–385, 2009.
Acknowledgments: We thank Anshul Kothari for his con-             [17] J. Feldman, S. Muthukrishnan, M. Pal, and C. Stein.
tributions to the algorithms and system design.                        Budget optimization in search-based advertising
                                                                       auctions. In EC, 2007.
8. REFERENCES                                                     [18] S. Ghemawat, H. Gobioff, and S.-T. Leung. The
                                                                       Google File System. In 19th ACM Symposium on
 [1] Google automatic bidding product.                                 Operating Systems Principles, 2003.
     http://adwords.google.com/support/aw/bin/                    [19] G. Goel and A. Mehta. Online budgeted matching in
     answer.py?hl=en&answer=113234.                                    random input models with applications to Adwords.
 [2] Google conversion optimizer product. http:                        In SODA, 2008.
     //www.google.com/adwords/conversionoptimizer/.               [20] K. Hosanagar and V. Cherepanov. Optimal bidding in
 [3] Protocol buffers. Website, 2008.                                   stochastic budget constrained slot auctions. In EC,
     http://code.google.com/p/protobuf.                                2008.
 [4] Z. Abrams. Revenue maximization when bidders have            [21] M. Mahdian, H. Nazerzadeh, and A. Saberi.
     budgets. In SODA, 2006.                                           Allocating online advertisement space with unreliable
 [5] Z. Abrams, S. Keerthi, O. Mendelevitch, and                       estimates. In EC, 2007.
     J. Tomlin. Ad delivery with budgeted advertisers: a          [22] A. Mehta, A. Saberi, U. V. Vazirani, and V. V.
     comprehensive lp approach. J. Electronic Commerce                 Vazirani. Adwords and generalized online matching. J.
     Research, 9(1), 2008.                                             ACM, 54(5), 2007.
 [6] G. Aggarwal, A. Goel, and R. Motwani. Truthful               [23] R. Pike, S. Dorward, R. Griesemer, and S. Quinlan.
     auctions for pricing search keywords. In EC, 2006.                Interpreting the data: Parallel analysis with sawzall.
 [7] C. Borgs, J. Chayes, N. Immorlica, K. Jain,                       Scientific Programming Journal, 13:277–298, 2005.
     O. Etesami, and M. Mahdian. Dynamics of bid                  [24] P. Rusmevichientong and D. Williamson. An adaptive
     optimization in online advertisement auctions. In                 algorithm for selecting profitable keywords for
     Proc. of the 16th international conference on World               search-based advertising services. In EC, 2006.
     Wide Web, pages 531–540. ACM, 2007.                          [25] H. Varian. Position auctions. International Journal of
 [8] C. Borgs, J. Chayes, N. Immorlica, M. Mahdian, and                Industrial Organization, 25(6):1163–1178, 2007.
     A. Saberi. Multi-unit auctions with                          [26] E. Vee, S. Vassilvitskii, and J. Shanmugasundaram.
     budget-constrained bidders. In EC, pages 44–51, 2005.             Optimal online assignment with forecasts. In ACM
 [9] N. Buchbinder, K. Jain, and J. Naor. Online                       Conference on Electronic Commerce, pages 109–118,
     Primal-Dual Algorithms for Maximizing Ad-Auctions                 2010.
     Revenue. In ESA, 2007.
[10] M. Cary, A. Das, B. Edelman, I. Giotis, K. Heimerl,
     A. Karlin, C. Mathieu, and M. Schwarz. Greedy
     bidding strategies for keyword auctions. In Proc. of
     the 8th ACM conference on Electronic commerce,
     pages 262–271. ACM New York, NY, USA, 2007.

Optimizing Budget Constrained Spend in Search Advertising

  • 1.
    Optimizing Budget ConstrainedSpend in Search Advertising ∗ Chinmay Karande Aranyak Mehta Ramakrishnan Srikant [email protected] [email protected] [email protected] Google Research Mountain View, CA, USA ABSTRACT Each advertiser also specifies a daily budget, which is an Search engine ad auctions typically have a significant frac- upper bound on the amount of money they are prepared to tion of advertisers who are budget constrained, i.e., if al- spend each day. While many advertisers use bids as the pri- lowed to participate in every auction that they bid on, they mary knob to control their spend and never hit their budget, would spend more than their budget. This yields an im- there exists a significant fraction of advertisers who would portant problem: selecting the ad auctions in which these spend more than their budget if they participated in ev- advertisers participate, in order to optimize different system ery auction that their keywords match. Search engines of- objectives such as the return on investment for advertisers, ten provide an option to automatically scale the advertiser’s and the quality of ads shown to users. We present a sys- bids [1, 2], but a substantial fraction of budget constrained tem and algorithms for optimizing such budget constrained advertisers do not opt into these programs. For these ad- spend. The system is designed be deployed in a large search vertisers, the search engine has to determine the subset of engine, with hundreds of thousands of advertisers, millions of auctions the budget constrained advertiser should partici- searches per hour, and with the query stream being only par- pate in. This creates a dependence between auctions on tially predictable. We have validated the system design by different queries, and leads to essentially a matching or an implementing it in the Google ads serving system and run- assignment problem of advertisers to auctions. ning experiments on live traffic. We have also compared our In this paper, we consider the problem of optimized budget algorithm to previous work that casts this problem as a large allocation: allocating advertisers to queries such that budget linear programming problem limited to popular queries, and constraints are satisfied, while simultaneously optimizing a show that our algorithms yield substantially better results. specified objective. Prior work in this area has often chosen revenue as the objective to optimize. However, the long term revenue of a search engine depends on providing good value to users and to advertisers. If users see low quality ads, Categories and Subject Descriptors then this can result in ad-blindness and a drop in revenue. G.1.6 [Optimization]: Linear Programming; K.6.0 [General]: If advertisers see low return on investment (ROI), then they Economics will reduce their bids and budgets, again resulting in a drop in revenue. Thus, we explore two other objectives in this paper: improving quality, and advertiser ROI. 1. INTRODUCTION Search ad auctions have emerged as the primary model for Paper Outline We describe the problem of optimized bud- monetizing the value provided by search engines. Advertis- get allocation in Section 2, followed by related work in Sec- ers use phrases (keywords) to specify the set of queries they tion 3. We present our algorithms and system design in Sec- are interested in, and bid the cost they are prepared to pay tion 4, and discuss certain key properties of the algorithm in per click on their ad. For each search query, the set of ads Section 5. In Section 6, we show that our algorithms yield to show, the order in which they are shown, and the cost substantially better results than prior work and also describe per click for each shown ad are determined via an auction. the results of experiments on live traffic. We conclude with some closing thoughts in Section 7. ∗The author is currently at Facebook, Inc., Menlo Park, CA, USA. The work described in this paper was done while the author was at Google. 2. PROBLEM DEFINITION Let A be the set of advertisers, and Q the set of queries. Permission to make digital or hard copies of all or part of this work for Each advertiser a ∈ A comes with a daily budget Ba . Let personal or classroom use is granted without fee provided that copies are G(A, Q, E) be a bipartite graph such that for a ∈ A and not made or distributed for profit or commercial advantage and that copies q ∈ Q, edge (a, q) ∈ E means that an ad of a is eligible for bear this notice and the full citation on the first page. To copy otherwise, to the auction for query q (a’s keywords match q). Let ctr(a,q) republish, to post on servers or to redistribute to lists, requires prior specific be the probability of a click on a’s ad for q, and bid(a,q) be permission and/or a fee. WSDM’13, February 4–8, 2013, Rome, Italy. the amount a is willing to pay per click. (Note that ctr(a,q) Copyright 2013 ACM 978-1-4503-1869-3/13/02 ...$15.00. is the position-independent CTR, i.e., the probability of a
  • 2.
    click at somechosen fixed position. In other words, ctr(a,q) or advertiser clicks per dollar. Thus their approach yields does not depend on the position of the ad.) an optimal solution to the click maximization problem in When a query q arrives, the eligible ads a for q are ranked the general GSP setting. However, there are two reasons by bid(a,q) ctr(a,q) and shown in that order. Denoting the why this work is not the final word on the allocation ap- jth ad in the order as aj , the cost per click of aj is set as proach. First, the LP can only run over head queries due to resource constraints, which brings up an interesting ques- cpc(aj ) = bid(aj+1 ,q) ctr(aj+1 ,q) /ctr(aj ,q) tion: Can a non-optimal algorithm that runs over the entire This is known as the generalized second price (GSP) auction query stream beat an optimal algorithm that is restricted to (see, e.g., [25, 14, 6]). the head? Second, the LP formulation can yield solutions Let Ta denote the spend of the advertiser a if a partici- that are clearly unfair, in that the allocation for some adver- pates in all the auctions for which a is eligible via the key- tisers is very different from what the advertiser would choose word match (ignoring a’s budget). If Ta > Ba , the advertiser for themselves for that objective (see Section 5). Hence it is budget constrained, and the search engine has to limit (or is unclear whether LP solutions can be deployed by search throttle) the set of auctions in which the advertiser partici- engines. pates. A second stream of work focused on optimizing search en- gine revenue in a first price auction. Mehta et al. [22] and Objectives. For budget constrained advertisers, the search Buchbinder et al. [9] both provide an online approximation engine can select to optimize different objectives such as: algorithm for optimizing revenue, with a best possible ap- • the quality of the ads shown, e.g., maximize the position- proximation guarantee of 1 − 1/e 63%, for the scenario in independent predicted click-through rate, which we do not know anything about the future distribu- • reduce advertiser cost-per-click (maximize the number tion of queries. In contrast, we assume that we can predict of clicks), or future distributions of queries, albeit noisily. Mahdian et al. [21] extended the algorithm from [22] to provide guaran- • increase advertiser ROI. tees in the setting when we have unreliable estimates. [19, We do not have full knowledge of the graph G, but only 13] analyzed revenue maximization in an average case set- information from past data, i.e., the graphs from previous ting where queries arrive in a random order, or are picked days. We also require practical algorithms which work on- i.i.d from a distribution. All the above papers focus solely line, i.e., given the next query, decide, in sub-second re- on search engine revenue, assume a first price auction, and sponse time, which ad impressions to show. We can now do not consider multiple slots or positions. We focus on op- define the problem as follows. timizing very different objectives, such as user experience and advertiser ROI, in the second price GSP auction, with The Optimized Budget Allocation Problem: Given multiple slots and positions. information about the past as G (A, Q , E ) including the There has been considerable related work in budget allo- bids and CTR for each advertiser-query pair, the budget cation for display ads, e.g., [11, 12, 15, 16, 26]. This work Ba , for each advertiser a, and a specified objective function also does not consider second price auctions, or the objec- to optimize: For each query arriving in an online stream of tives we study. queries Q, decide which advertisers should participate in the There has also been work on designing new incentive com- auction. patible auctions in the presence of bidders with budget con- straints, initiated by [4, 8]. Our goal is to optimize budgets 3. RELATED WORK in the context of the GSP auction used by search engines, There have been two broad approaches to optimizing bud- and hence we do not consider alternate auction designs. get constrained spend: allocation and bid modification. Al- Bid Modification The second approach [7, 10, 17, 20, 24] location treats bids as fixed, and allows only decisions about studies the advertiser’s problem of bidding optimally in or- whether the advertiser should participate in the auction. der to maximize ROI (or some notion of utility). Changing This is our setting, where we are constrained to not change bids in such a manner is an alternate solution to dealing with bids, but only optimize allocations. budget constrained advertisers: if the advertiser permits, the The second approach, bid modification, is in a setting search engine can scale bids down until the advertiser is no where bids can be changed. This body of work typically longer budget constrained. However, despite the availability considers the problem from the advertiser’s perspective, and of a free option to automatically scale bids [1], a substantial assumes full knowledge of the information that advertisers fraction of budget constrained advertisers have not opted in typically have (such as the value of clicks or conversions). to use this product. For these advertisers, the search engine However, this work can also be adapted to be applicable (as a policy decision) has to use the allocation approach, from a search engine’s perspective, for advertisers who have not bid modification. Hence the work on bid modification, opted in and allow the search engine to change their bids. while clearly an appealing alternative, is not applicable in We next describe the related work in the allocation and our setting. bid modification approaches. Despite the lack of applicability, it would be interesting to Allocation The paper by Abrams et al. [5] is the closest understand how optimized budget allocation fares against to our work. They solve the offline problem (with complete bid scaling. We compare our algorithms to bid scaling in knowledge of future query frequency) for head queries (the Section 6. most popular search queries) in the GSP auction setting us- ing a linear program (LP), to optimize search engine revenue
  • 3.
    4. ALGORITHMS To achieve this goal, our algorithm uses a third input, Given the problem of optimizing budget constrained spend, the rank Rθ,a of an impression for a given advertiser a and the first step is to neither over- nor under-spend.1 The naive metric θ. Define way to do this is to let advertisers participate in auctions un- • Fθ,a (µ): Estimated fraction of maximum spend Ta for til they hit their budget, and then make them ineligible for which θ(i) ≤ µ. In other words, F is the estimated the rest of the day. Clearly, this will yield very biased traffic cumulative distribution function of θ. to the advertiser, and also skew auction competition towards the earlier part of the day. The next simplest approach, • Rθ,a (µ) = 1 - Fθ,a (µ). which does not have these drawbacks, is Vanilla Probabilis- The lower the value of the rank R, the better the impression tic Throttling (VPT). scores on our metric. For each advertiser a, define: We now define the Optimized Throttling algorithm: • Ba : the remaining budget for the day (or time period). For each arriving query q: • Ta : the remaining maximum spend for the rest of the For each budget constrained advertiser a: day, i.e., the total spend if the advertiser had unlimited If Rθ,a (θ(i)) ≤ Ba /Ta , then a budget. participates in the auction. We now define the Vanilla Probabilistic Throttling al- gorithm: Figure 2: Algorithm OT For each arriving query q: While the algorithm appears to be a straightforward greedy For each budget constrained advertiser a: algorithm, there are several subtleties: Flip a coin with P [Heads] = Ba /Ta . • The algorithm yields a solution that is “fair” to each If heads, a participates in the auction. advertiser (formally defined, and proved in Section 5). Figure 1: Vanilla Probabilistic Throttling (VPT) • The algorithm yields an optimal fair solution under certain constraints. While we can find many examples If our estimate of Ta is accurate, then each advertiser where the constraints don’t hold, the constraints hold spends very close to her budget in expectation (and with often enough that the algorithm is not too far from high probability, by Chernoff bounds). Advertisers also re- optimal in practice. ceive a representative sample of the traffic they are eligible • We have transformed the domain from the space of for. queries to the distribution of some property of queries. Predicting the frequency of queries in the tail is in- 4.1 Optimized Throttling tractable. Predicting the distribution of specific prop- We now present algorithms for optimizing one or more of erties of the queries in the tail is very tractable (for the following objectives: properties we use in our algorithms). • average quality of ads shown, represented by CTR. • The choice of metric lets us optimize a wide range • clicks per dollar, of objectives, or combinations of objectives (discussed next). • conversions per dollar, We now define five instantiations of OT, corresponding • the advertiser’s profit, using the difference between the to the four objectives we listed earlier, and a fifth objective bid and the cost per click as the estimate of profit. that combines quality and clicks: For the chosen objective, given a candidate ad impression i (i.e., for a specific query and advertiser), we compute a met- Objective θ(i) ric θ(i) which tracks the desired objective. Given a choice OT-CTR Ad quality ctri between two impressions (for the same advertiser), we would OT-Clicks Clicks 1/cpci prefer to show the impression with the higher value of the OT-Profit Profit (bidi − cpci )/cpci metric. For example, if the objective is quality, the metric OT-Conversions Conversions cvri vali /cpci could be the position-independent predicted click-through ctri (bidi − cpci ) OT-CTR-Profit Blend rate of the advertiser for this query, reflecting our desire to cpci show higher-quality impressions. Conceptually, we would like to rank all the impressions for an advertiser by the de- Figure 3: Instantiations of OT sired metric, and choose the impressions with the highest metric score until the budget is filled. The first metric, ctri is straightforward: we are using 1 position-independent predicted CTR as a proxy for qual- While most advertisers are clearly budget constrained or ity. (Of course, one could use any arbitrary quality metric unconstrained, advertisers at the margin may switch back- instead of CTR.) and-forth between the two states, based on traffic. It is straightforward to handle these marginal cases. For ease of To understand the next three metrics, it is helpful to exposition, we ignore this issue in the paper; however our multiply the numerator and denominator by the position- implementation does handle these cases gracefully. dependent predicted CTR. For example, in OT-Clicks, the
  • 4.
    numerator now becomesthe expected number of clicks, and (purely for ease of exposition). Since a search engine has a the denominator the expected cost, and the ratio is the clicks large number of advertisers, we would like to compress the per dollar. Since the total spend is fixed for budget con- information in R at serving time. We compress R into a strained advertisers, optimizing clicks is equivalent to opti- histogram H. mizing clicks per dollar. 2 Recall that R is only used to answer the question of whether For OT-Profit, assuming that bidi is the value to the R(θ(i)) is less than ip (= Ba /Ta ). So if the estimate of ip advertiser, bidi −cpci is the expected profit to the advertiser was very stable, we need just two buckets in the histogram if there is a click on this impression. So the metric for OT- H, with the boundary at the value c∗ such that R(c∗ ) = ip. Profit is the expected profit per dollar of spend. With two buckets, we would need just 8 bytes of data per For OT-Conversions the metric is the expected conver- budget constrained advertiser: the value c∗ and the value of sion value per dollar, given the value of a conversion on this R(c∗ ). impression vali , and a model that predicts the conversion We may choose to create additional buckets around the rate cvri . Building machine learning models for estimat- threshold c∗ , based on the tradeoff between increased gains ing cvri is beyond the scope of the paper. However, we from choosing the highest scoring impressions (see below) will note the existence of commercial systems that estimate versus memory constraints. For each bucket m in H ex- cvri given advertiser-specific conversion data, e.g., [2]. Ad- cluding the last bucket, we store the bucket boundary and vertisers do not necessarily need to provide conversion data the value of R at the bucket boundary. We refer to these in order to benefit from the techniques in the paper. One histograms as throttling parameters. can build models for estimating conversion rate that are not Using the estimates at serving: When a query arrives, advertiser-specific, e.g., we found that ctri is correlated with we need to determine, for each budget constrained advertiser cvri . a, whether a participates in the auction. The input consists The final metric simply multiplies the metric for CTR of the histogram H, the current value of ip, and the value and profit. The intuition is that for advertisers with a lot of of θ(i) for the current impression i. Let θ(i) be in bucket variance on CTR but not much on profit, the algorithm will m. Let Hb (m) denote the upper boundary of bucket m, and focus on CTR. Similarly for advertisers with more variance let Hr (m) = R(Hb (m)). Then the advertiser participates in on profit than CTR, the algorithm will focus on profit. Thus the auction with the following probability: if the search engine cares about both CTR and advertiser if Hr (m) ≤ ip  profit, the blended metric will likely yield better results than  1, simply averaging the results of the individual metrics. 0, if Hr (m − 1) ≥ ip ip−Hr (m−1) , otherwise  Hr (m)−Hr (m−1) 4.2 System design and implementation The first two cases are straightforward, and follow directly We next describe our implementation of the algorithm in from the goal that we (do not) show the impression if it is the production Google ads serving system. Our system has (not) in the top ip fraction of spend. In the third case, the three primary components: probability that the impression is in the top ip fraction of • estimating Ba /Ta , spend is given by (ip − Hr (m − 1))/(Hr (m) − Hr (m − 1)), and hence we show the impression with that probability. • estimating and compressing Rθ,a , and Implementation: We have built our data collection pipeline • using the estimates at serving time. on top of Google’s sawzall [23] infrastructure, which allows We will use OT-CTR to illustrate the techniques used, and us to process historical query data in parallel. The throt- discuss any differences between OT-CTR and the other in- tling parameters generated by the data collection pipeline is stantiations as they come up. written to the Google File System (GFS) [18]. The data is stored in the protocol buffer format [3], which reduces the Estimating Ba /Ta : This component estimates the impres- storage requirement as well as make the transfer and pro- sion probability ip, where ip is defined as Ba /Ta . In other cessing of the data more efficient. From GFS, the throttling words, ip is the probability with which we should allow an parameters data is picked up by the ads data push system, impression of the advertiser to participate in an auction, which writes it to one of its data channels. The ads serving in order to show her impressions uniformly through the re- system gets the updated throttling parameters through this mainder of the day, and exhaust her budget at the end of the channel. day. We estimate ip using traffic information from the past, and using the available budget. Given the inherent noise in Interactions between budget constrained advertis- traffic, a feedback loop is used at frequent intervals to adjust ers: Out of the metrics in Figure 3, ctri , bidi , cvri and the probability. Clearly, the more accurate the estimate of vali are all functions purely of the impression, and do not ip, the more the gains from optimization. change based on the other ads in the auction. However, cpci is a function of the runner-up, since we use a second- Estimating and compressing Rθ,a : We use historical price auction. We analyzed the logs, and found that budget data to compute the cumulative distribution function Fθ,a . constrained ads are more likely to be next to budget uncon- Rθ,a is a trivial transformation of Fθ,a . In the rest of this strained ads than budget unconstrained ads. However, we section, we will drop the subscripts and refer to Rθ,a as R will have instances with consecutive budget constrained ads. 2 Some metrics like ctri are purely a function of the impres- We use an iterative technique for improving the performance sion. However, metrics like cpc may change based on the of the online algorithms in these cases. other ads in the auction. We discuss this issue in Section 4.2. The intuition behind the iterative technique is to run sev-
  • 5.
    eral (simulated) iterationsof the auction. In each iteration, we compute the metric θ for each budget constrained adver- • Rank the impressions i ∈ Ea in order of decreas- tiser (including those that do not participate in the auction ing 1/cpci . in the current iteration), and based on the value of the met- • Pick the top impressions in Ea according to this ric, decide whether that advertiser participates in the next ranking until the budget runs out, i.e. the largest iteration. Note that an impression may be removed in one prefix Ia , s.t. i∈Ia spendi ≤ Ba , and at most iteration and re-enter in a subsequent iteration. While we one additional fractional impression to finish the cannot prove convergence, in practice this often converges budget. in a few rounds, or at least leads to an improved solution over simply using the value of θ from the first iteration. Figure 4: Offline-OT-Clicks-Single-Advertiser 5. KEY PROPERTIES OF ALGORITHM The overall effectiveness of the algorithm depends on two Without fractional impressions, this is the integral knap- factors: the accuracy of the prediction of future traffic (total sack problem. Since one click is a tiny fraction of adver- traffic and the distribution of the metric), and the intrinsic tiser spend, we allow the choice of one fractional impres- effectiveness of the algorithm. To understand the latter, we sion, thereby converting the problem to a fractional knap- analyze the algorithm on the offline version of the problem, sack problem. Observing that spendi = αi ctri cpci and in which the advertiser-query graph is known. We will focus therefore αi ctri /spendi = 1/cpci , we get the algorithm in on the instantiations with a linear objective: OT-Clicks, Figure 4, which is a simple greedy algorithm, using the ratio OT-Profit and OT-Conversions. For non-linear objec- of the expected value from the click to the cost of the click. tives such as CTR, optimality with even a single budget con- Theorem 1 follows from the well known optimality of the strained advertiser may require allocation in a manner that greedy strategy for the fractional knapsack problem, and its is clearly against the advertiser’s interest. So algorithms like proof is omitted here. OT-CTR which are designed to both be good for advertisers and improve quality cannot be optimal. Theorem 1. Offline-OT-Clicks-Single-Advertiser com- We first consider the special case of a single budget con- putes an optimal solution to the ROI maximization problem strained advertiser, and show that our algorithm is optimal for a single budget constrained advertiser. for linear objectives. We then discuss the issue of fairness when there are multiple budget constrained advertisers, and 5.2 Fair Allocations show by example that linear programming can yield solu- We begin with the following definition of an optimal allo- tions that are optimal but not fair. When we restrict the cation: space of solutions to fair solutions, we show that our algo- Definition 1. We call an allocation optimal if it maxi- rithm yields an optimal solution even with multiple budget mizes constrained advertisers, as long as there are no adjacent bud- get constrained advertisers in a given auction. Obviously, a∈A wa i∈Ia αi ctri we do get adjacent budget constrained advertisers in the a∈A wa real world – but the optimality result with that constraint over all possible allocations (given the wa ≥ 0, which are suggests that the algorithm will perform well in practice. arbitrary advertiser specific weights). For ease of exposition, we will choose the simplest instan- tiation, OT-Clicks as the representative algorithm in the However, this definition has the problem that in trying to proofs. It is straightforward to sketch out similar proofs for maximize the weighted average of advertiser ROI, we may OT-Profit and OT-Conversions. end up sacrificing the interests of some advertisers, as the following example illustrates. 5.1 Optimal For A Single Budget Constrained Advertiser Example 1. There are two budget constrained bidders, We start by proving that given G(A, Q, E) and a single a and b, each with a budget of $100. There are two different budget constrained advertiser a ∈ A, OT-Clicks maximizes queries q1 and q2 , each with 100 instances. q1 has a mini- clicks per dollar for a. While the proof is obvious, it is useful mum reserve cpc of $1, and q2 has a reserve of $2 (one may as a building block to more interesting results. replace the reserves by an unconstrained bidder, keeping the Given an advertiser a, we define: example unchanged). The bidders bid the following values • Ea to be the set of impressions in queries where a is for the queries (a does not bid on q2 ): eligible to participate in the auction, and • Ia ⊆ Ea to be the set of impressions where a partici- q1 q2 pates in the auction. a 20 − The total expected clicks is i∈Ia αi ctri , where αi is the b 10 10 position normalizer. For fixed budget, maximizing clicks min 1 2 per dollar is the same as maximizing total clicks, which is captured by the following linear program: Given a, find For ease of exposition, we assume that the CTR is equal Max αi ctri , s.t. spendi ≤ Ba (1) for all advertisers and query pairs, in both positions. To Ia ⊆Ea maximize total clicks, or equivalently, clicks per dollar, the i∈Ia i∈Ia
  • 6.
    optimal solution isto let a participate in q1 , and b in q2 . Then a gets 100 clicks and b gets 50, giving a total of 150 1. Begin with an allocation with only budget uncon- clicks at a cost-per-click of $1.33. But this solution is not strained advertisers. fair to b, who would rather show for q1 and get a cpc of $1 2. For each budget constrained advertiser a ∈ A (in and hence 100 clicks instead of 50. In this scenario a would turn): get only 10 clicks at a cpc of $10, giving a total of 110 clicks –Run Offline-OT-Clicks-Single-Advertiser for a. at an average cpc of $1.81. 2 –Update I. Example 1 motivates the following definition: 3. If the allocation has not converged, go to step 2. Definition 2. We define an allocation I ⊆ E to be a fair allocation for the clicks objective, if ∀ a ∈ A: Figure 5: Algorithm Offline-OT-Clicks. a. The expected spend of a is equal to Ba (a exhausts its budget), or Ia = Ea (we show every impression of a it due to the insertion or removal of other budget constrained is eligible for). impressions). As an aside, this key property also holds for first price auctions, and the following results also hold for b. Given the allocation of other advertisers (i.e. IIa ), Ia first price auctions. maximizes the total expected clicks that a can obtain Given this property, for each advertiser a, the 1 / cpc or- within its budget. dering of the impressions in Ea is fixed throughout the al- We call an allocation I an optimal fair allocation if it is gorithm, independent of the allocations of the other adver- a fair allocation, and it maximizes tisers. From the definition, it is easy to see that every fair allocation has the property that for each a ∈ A, Ia is a prefix a∈A wa i∈Ia αi ctri of this fixed ordering of Ea , since a non-prefix would violate a∈A wa Property (b) in Definition 2. By definition of the algorithm, among all fair allocation (where the wa ≥ 0 are arbitrary this is true also for the allocations produced during every advertiser specific weights). step of the algorithm. We call allocations with this property prefix-allocations. We can similarly define fair allocations and optimal fair allocations for the profit objective and the conversions ob- Lemma 2. Algorithm Offline-OT-Clicks converges to a jective. fair allocation if there are no adjacent constrained bidders In Example 1, the allocation which maximizes the sum in any query. (giving 150 clicks) is not a fair allocation, while allocating Proof. Fix an advertiser a, and consider the allocation both a and b to q1 is (even though this reduces the total to a in each round. We claim that the prefix chosen for clicks to 110). We note that our definition of fair allocation a only increases in length in subsequent rounds. Since for is analogous to that of a Nash equilibrium in games. each advertiser the prefix cannot increase indefinitely, this means that the algorithm converges to some allocation, say 5.3 Optimal Fair Allocation When No Adja- I ∗ . Property (a) in Definition 2 holds for I ∗ because of the cent Budget Constrained Advertisers termination condition of Offline-OT-Clicks. Property (b) When there are multiple budget constrained advertisers, holds by definition of Offline-OT-Clicks-Single-Advertiser, a natural local algorithm is to cycle over the different ad- which is run in every round of Offline-OT-Clicks. vertisers until convergence, running Algorithm Offline-OT- It remains to prove the claim that the prefix for a can only Clicks-Single-Advertiser for each advertiser a. In the re- increase in each round. Suppose this is true up to the time mainder of this section we analyze this algorithm, which is we process advertiser a in round k. In between the times defined in Figure 5. we process a in rounds k and k + 1, the algorithm may have In a GSP auction, there are two ways in which the intro- introduced impressions of other advertisers in the auctions duction or removal of an impression i of one budget con- for which we show a’s impressions in round k. The only strained advertiser can effect an impression i of another: effect this can have on a’s impression is to possibly lower its position, and therefore of its expected spend. Thus, in • i can change the cpc of i round k + 1, the algorithm may need to pick a longer prefix to finish a’s budget. • i can change the position of i , and hence the expected spend from i . For two prefix-allocations I, J, we say I J if for every advertiser a ∈ A, the prefix length in I is at most the prefix Due to this interaction, it is unclear whether Algorithm length in J, and therefore Ia ⊆ Ja . Offline-OT-Clicks always yields an optimal fair allocation. However we will show that it does converge to a optimal fair Lemma 3. Let I ∗ be the fair allocation that Algorithm allocation when there are no adjacent budget constrained Offline-OT-Clicks converges to (when there are no adja- bidders, i.e., in the auction ranking of all eligible bidders, cent constrained bidders). Then there are no consecutive budget constrained bidders. The I∗ I, for all fair allocations I key property is that in such a scenario, the cpc of a budget constrained impression is independent of other budget con- Proof. If this is not true for some fair allocation I, then strained impressions (even though the position may change consider the first time during the run of the algorithm that
  • 7.
    some advertiser a’sprefix becomes longer than its prefix in I. Comparing to I, the algorithm’s current allocation max αsi ctrqsi xqs has all advertisers a = a with smaller or equal prefixes. q,s,i Thus the position normalizers of a’s impressions are larger or equal during this step of the algorithm than in the al- Allocation constraint: xqs ≤ Nq ∀q location I. This implies that the prefix of a in the current s allocation should be shorter or equal than that in I, since Budget constraint: cpcqsi αsi ctrqsi xqs ≤ Bi ∀i the expected spend in each auction is at least that in I, a q,s contradiction. One can optimize for other linear metrics, such as rev- Note that Lemma 3 implies that there is a unique -minimal enue by suitably changing the objective function. It is not fair allocation, and that the algorithm converges to it. possible to directly optimize for non-linear metrics such as CTR. Theorem 4. Algorithm Offline-OT-Clicks converges to The advantage of LP (compared to our approach) is that an optimal fair allocation if there are no adjacent constrained the LP fully incorporates interactions between different bud- bidders in any query. get constrained advertisers. However, even the most efficient Proof. From Lemma 2 we know that the algorithm con- LP solvers cannot solve linear programs on the volume of verges to a fair allocation I ∗ . From Lemma 3 we get that data a search engine sees in a day. Thus we have to limit for every advertiser a, and every fair allocation I, a’s pre- LP to the head portion of query traffic, and use an approach fix in I ∗ is no longer than its prefix in I. Thus a spends such as Vanilla Probabilistic Throttling on the long tail. the same amount of money (in expectation) in I ∗ as in I, but spends it on a subset (I ∗ )a ⊆ Ia such that impressions 6.1.2 Bid Scaling in Ia (I ∗ )a have lower (or equal) ratios of value of click to Both the OT algorithms and the LP formulation work un- cost of click, as impressions in (I ∗ )a . This implies that a der the constraint that the bidder’s inputs (bids) can not be gets at least as much total expected value in I ∗ as in I, in changed by the search engine, but they can only be throt- turn implying that I ∗ maximizes any weighted average (over tled. Bid Scaling algorithms go outside this design space by advertisers) of the total expected value obtained, among all lowering the bids of the constrained bidders appropriately. fair allocations. As we discussed in Section 1, bid scaling is not an option for our problem. Nevertheless, it is interesting to compare our As mentioned earlier, it is easy to prove similar statements approach against bid scaling. about other linear objectives such as optimizing profit and Our bid scaling algorithm finds one bid multiplier per bid- optimizing conversions. der and applies it to all the advertiser’s bids, similar to the Budget Optimizer product in Google Adwords [1]. The mul- 6. EMPIRICAL EVALUATION tiplier is calculated so as to spend exactly the bidder’s bud- We report two types of experiments below, offline simu- get. lations to compare different optimized allocation solutions, and live experiments on Google traffic. We start with a 6.2 Simulation Methodology description of prior approaches: LP and BidScaling. We conducted simulations using a 20% sample of all US queries made over a week to the Google search engine. We 6.1 LP and BidScaling sorted queries by the total impressions for that query (summed over all query instances), and picked the queries with the 6.1.1 Linear Programming most impressions as the “head” queries for the LP. The num- Assuming complete knowledge of the data, a theoretical ber of queries in the head was chosen so that the LP could benchmark for any budget allocation algorithm can be ob- run in memory. (Note that as we move from the head to the tained via linear programming [5]. For any query q, let s tail, each query has relatively few instances. So even dou- denote the set of bidders chosen to participate in the auction bling the memory will not substantially increase the fraction by an allocation mechanism. Let C(q) be the collection of all of revenue or clicks covered by the “head” queries.) For each such sets, generated by any conceivable algorithm. Clearly, set (head and tail), we computed an appropriate budget for we can completely specify an allocation policy by specifying that set by scaling down the total budgets. For each query for each query q, the set s ∈ C(q) of bidders permitted by we also have the candidate ads together with the relevant the algorithm to participate in the auction for q. metrics, namely, the bid and the predicted CTR. Our sim- We can then attempt to discover the best allocation policy ulation is offline, i.e., the set of queries and candidates is (for a given linear objective, say click maximization) using fixed. Since all the algorithms we simulate are time inde- linear programming. Let Nq be the number of times query q pendent (as opposed to some of the bid scaling algorithms appears in the data and xqs be the number of times the set studied earlier, e.g., [22, 9, 13, 16, 15]), we do not need to s of bidders is selected for query q. For a bidder i ∈ s, let worry about the time-arrival order of the queries. cpcqsi and ctrqsi be the cost-per-click and click-through rate We simulated the following throttling algorithms for our for i in the auction for q. Let αsi be the position normalizer first set of comparisons: for impression i. Let Bi be the budget of bidder i. Then, 1. VPT: Vanilla Probabilistic Throttling. we can discover the best allocation policy that maximizes total clicks provided to budget constrained advertisers as 2. OT-Clicks: Optimized Throttling, objective is clicks the solution of the following linear program: (or inverse of cpc).
  • 8.
    3. OT-CTR: OptimizedThrottling, objective is CTR. 4. BidScaling, as described in Section 6.1.2 5. LP-Clicks: Click maximizing Linear Program on head queries. 6.3 Results We show the changes in various objectives relative to the baseline of Vanilla Probabilistic Throttling (VPT). It is im- portant to note that while we expect the overall conclusions to carry over to an online setting where the query distribu- tion changes over time, the exact numbers will change. In general, the gains from optimized budget allocation or bid scaling will be significantly lower in live experiments due to changes in query traffic. For this reason, as well as data Figure 6: Impact on clicks-per-dollar, over budget confidentiality, we omit the scale from our graphs below. constrained campaigns. The baseline is VPT. 6.3.1 Comparison with LP Figure 6 shows the change in clicks per dollar for budget point, all figures represent performance using all queries, constrained advertisers for each of the algorithms. The first not just the head.) Here OT-Clicks does much worse than set of numbers, “head”, show the results when we artificially BidScaling, though it is still positive. In fact, even OT- restrict all the algorithms to operate over the same set of CTR beats OT-Clicks. OT-Profit, which attempts to head queries as LP-Clicks, with VPT on the tail. Since optimize profit, yields similar results to BidScaling. LP-Clicks is not just optimal, but can also generate solu- As discussed in the introduction, user experience (qual- tions that are not fair (unlike the other algorithms), it is not ity of ads) is as important as advertiser ROI for long-term surprising that LP-Clicks outperforms the alternatives. success. At first glance, one might expect that optimizing However, when we allow the algorithms to optimize over clicks per dollar would yield similar results to optimizing the entire dataset – the “all” numbers – the algorithms that CTR: doesn’t a higher click-through rate mean more clicks? can use the full data dramatically outperform LP-Clicks. However, what matters for optimizing clicks per dollar (with In fact, even OT-CTR, which is optimizing CTR and not a fixed budget) is the cost per click, not the click-through CPC, yields a higher drop in CPC (or equivalently, more rate. clicks per dollar) than LP-Clicks. The reason for the poor Figure 8 shows the change in CTR for each of the al- performance of LP-Clicks is that the LP can be run only gorithms. OT-CTR dramatically outperforms all other al- on the head, and even though the head queries account for gorithms, not surprising since it is the only algorithm ex- a substantial portion of revenue, they are relatively homo- plicitly trying to optimize quality. Interestingly, while both geneous – the potential gains from optimization are more OT-Profit and BidScaling gave similar gains in profit- in the tail than the head. We found that this held for the per-dollar, their effect on quality is quite different: OT- other metrics as well, i.e., the substantial majority of the Profit improves CTR, while BidScaling reduces CTR. gains from optimization came from the tail queries.3 6.3.3 Multiple Objectives 6.3.2 Comparison with BidScaling We now present results with metrics that blend the CTR The other interesting comparison in Figure 6 is between and profit objectives. In Section 4 we had conjectured that OT-Clicks and BidScaling. BidScaling performs slightly blended metrics might yield better results than individual better than OT-Clicks when restricted to head queries, as metrics, since different advertisers may have better scope many advertisers may appear for a relatively small number for optimization along different dimensions. Figures 9 and of queries in the head. Thus OT-Clicks, which doesn’t have 10 show the impact of two blended metrics: ctr(bidi − the flexibility to scale bids, has a bit less room to maneu- cpci )/cpc, and ctr2 (bidi − cpci )/cpc. Notice that OT- ver. Over all queries, OT-Clicks has much more scope to CTR-Profit, which uses the former as the metric, almost differentiate between queries, and hence does slightly better matches OT-Profit on profit-per-dollar, while yielding sig- than BidScaling. nificantly higher gains in CTR than OT-Profit. OT-CTR2- However, OT-Clicks may be getting the gains by drop- Profit further increases CTR gains, for a bit more drop in ping high bid, high cpc clicks which might still yield more profit-per-dollar. In addition to validating our conjecture profit for the advertiser than low bid, low cpc clicks. Figure 7 that blended metrics may yield better results, such blended shows how the algorithms do on estimated profit-per-dollar: metrics let the search engine pick any arbitrary point in a the sum of the bids minus the total cost, divided by the to- curve that trades gains in user quality for gains in advertiser tal cost, over all budget constrained campaigns. (From this value. 3 An implementation of LP that used more resources could 6.3.4 Summary narrow the gap by increasing the fraction of queries covered by the “head”. However, as the number of distinct queries For optimizing clicks-per-dollar, OT-Clicks dramatically increases rapidly for each fraction of additional coverage, outperformed LP-Clicks by using all the data. For op- every doubling of resources will only yield incremental gains. timizing profit-per-dollar, OT-Profit matched BidScal-
  • 9.
    Figure 7: Impacton profit-per-dollar, over budget Figure 9: Multiple objectives: impact on Profit-per- constrained campaigns. The baseline is VPT. dollar. The baseline is VPT. Figure 8: Impact on CTR (including all campaigns). Figure 10: Multiple objectives: impact on CTR. The The baseline is VPT. baseline is VPT. ing on profit-per-dollar, while yielding better CTR (quality was significantly less than in the simulations, since we have for users). If the primary goal was quality, OT-CTR blew complete knowledge of future queries in the simulations, un- away the other algorithms, while still improving advertiser like the partial predictability of live traffic. profit-per-dollar and clicks-per-dollar. Finally, blending mul- For OT-CTR, the experiments showed statistically signif- tiple metrics can yield better results than a single metric, icant improvements in quality for users, clicks, and conver- since different advertisers have more scope for optimization sions, while revenue was neutral – a Pareto improvement to along different metrics. all objectives. Gains in conversions per dollar were signifi- One might wonder whether implementation of such tech- cantly higher than the gains in clicks per dollar, since CTR niques would incentivize budget unconstrained advertisers is correlated with conversion rate. Thus by shifting spend to lower their budgets and become budget constrained. The to ads with higher CTR, we also increased the number of answer is negative: BidScaling did slightly better than conversions. OT-Profit from an advertiser’s perspective (ignoring qual- ity for users), and BidScaling only used the information 7. CONCLUSION available to the search engine, not the additional informa- We studied the problem of allocating budget constrained tion available to the advertiser. With additional information spend in order to maximize objectives such as quality for about conversion rates for each keyword, or the true value users, or ROI for advertisers. We introduced the concept of of each click (rather than using the bid as the proxy for fair allocations (analogous to Nash equilibriums), and con- value), the advertiser easily get better ROI by optimizing strained the space of algorithms to those that yielded fair their campaign (versus becoming budget constrained). allocations. We were also constrained (in our setting) to not modify bids. We proposed a family of Optimized Throttling 6.4 Live Traffic Experiments algorithms that work within these constraints, and can be We implemented our algorithms in Google’s production used to optimize different objectives. In fact, they can be ad serving system, and ran experiments on live traffic, with tuned to pick an arbitrary point in the tradeoff curve be- both OT-CTR and BidScaling. The results were consis- tween multiple objectives. tent with our simulations, though the magnitude of the gains Prior approaches such linear programming and bid scaling
  • 10.
    are not applicablein our setting: linear programming yields [11] D. X. Charles, M. Chickering, N. R. Devanur, K. Jain, unfair allocations, and bid scaling changes bids. It was nev- and M. Sanghi. Fast algorithms for finding matchings ertheless interesting to study how much of a penalty (if any) in lopsided bipartite graphs with applications to our algorithms pay for working within these constraints (fair display ads. In ACM Conference on Electronic allocations, fixed bids). We found that, surprisingly, our al- Commerce, pages 121–128, 2010. gorithms dramatically outperform linear programming – by [12] Y. Chen, P. Berkhin, B. Anderson, and N. Devanur. being fast enough to use all the data rather than being lim- Real-time bidding algorithms for performance-based ited to head queries. Our algorithms are also competitive display ad allocation. In KDD, pages 1307–1315. with bid scaling on advertiser metrics, while yielding better ACM, 2011. ad quality for users. [13] N. R. Devanur and T. P. Hayes. The adwords problem: The Optimized Throttling algorithms are designed for im- online keyword matching with budgeted bidders under plementation in a high throughput production system. The random permutations. In ACM Conference on computation overhead at serving time is negligible: just a Electronic Commerce, pages 71–78, 2009. few comparisons. The algorithms also have a minimal mem- [14] B. Edelman, M. Ostrovsky, and M. Schwarz. Internet ory footprint, as little as 8 bytes (plus hash table overhead) Advertising and the Generalized Second-Price per advertiser. Finally, they are robust with respect to errors Auction. American Economic Review, 97(1):242–259, in estimating future traffic, since they only need the total 2007. volume of traffic and the distribution of the chosen metric, [15] J. Feldman, M. Henzinger, N. Korula, V. S. Mirrokni, not the number of occurrences of each query. We validated and C. Stein. Online stochastic packing applied to our system design by implementing our algorithms in the display ad allocation. In ESA (1), pages 182–194, Google ads serving system, and running experiments on live 2010. traffic. The experiments showed significant improvements in [16] J. Feldman, N. Korula, V. S. Mirrokni, both advertiser ROI (conversions per dollar) and user expe- S. Muthukrishnan, and M. P´l. Online ad assignment a rience. with free disposal. In WINE, pages 374–385, 2009. Acknowledgments: We thank Anshul Kothari for his con- [17] J. Feldman, S. Muthukrishnan, M. Pal, and C. Stein. tributions to the algorithms and system design. Budget optimization in search-based advertising auctions. In EC, 2007. 8. REFERENCES [18] S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In 19th ACM Symposium on [1] Google automatic bidding product. Operating Systems Principles, 2003. http://adwords.google.com/support/aw/bin/ [19] G. Goel and A. Mehta. Online budgeted matching in answer.py?hl=en&answer=113234. random input models with applications to Adwords. [2] Google conversion optimizer product. http: In SODA, 2008. //www.google.com/adwords/conversionoptimizer/. [20] K. Hosanagar and V. Cherepanov. Optimal bidding in [3] Protocol buffers. Website, 2008. stochastic budget constrained slot auctions. In EC, http://code.google.com/p/protobuf. 2008. [4] Z. Abrams. Revenue maximization when bidders have [21] M. Mahdian, H. Nazerzadeh, and A. Saberi. budgets. In SODA, 2006. Allocating online advertisement space with unreliable [5] Z. Abrams, S. Keerthi, O. Mendelevitch, and estimates. In EC, 2007. J. Tomlin. Ad delivery with budgeted advertisers: a [22] A. Mehta, A. Saberi, U. V. Vazirani, and V. V. comprehensive lp approach. J. Electronic Commerce Vazirani. Adwords and generalized online matching. J. Research, 9(1), 2008. ACM, 54(5), 2007. [6] G. Aggarwal, A. Goel, and R. Motwani. Truthful [23] R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. auctions for pricing search keywords. In EC, 2006. Interpreting the data: Parallel analysis with sawzall. [7] C. Borgs, J. Chayes, N. Immorlica, K. Jain, Scientific Programming Journal, 13:277–298, 2005. O. Etesami, and M. Mahdian. Dynamics of bid [24] P. Rusmevichientong and D. Williamson. An adaptive optimization in online advertisement auctions. In algorithm for selecting profitable keywords for Proc. of the 16th international conference on World search-based advertising services. In EC, 2006. Wide Web, pages 531–540. ACM, 2007. [25] H. Varian. Position auctions. International Journal of [8] C. Borgs, J. Chayes, N. Immorlica, M. Mahdian, and Industrial Organization, 25(6):1163–1178, 2007. A. Saberi. Multi-unit auctions with [26] E. Vee, S. Vassilvitskii, and J. Shanmugasundaram. budget-constrained bidders. In EC, pages 44–51, 2005. Optimal online assignment with forecasts. In ACM [9] N. Buchbinder, K. Jain, and J. Naor. Online Conference on Electronic Commerce, pages 109–118, Primal-Dual Algorithms for Maximizing Ad-Auctions 2010. Revenue. In ESA, 2007. [10] M. Cary, A. Das, B. Edelman, I. Giotis, K. Heimerl, A. Karlin, C. Mathieu, and M. Schwarz. Greedy bidding strategies for keyword auctions. In Proc. of the 8th ACM conference on Electronic commerce, pages 262–271. ACM New York, NY, USA, 2007.