An Adaptive Proportional Value-per-
    Click Agent for Bidding in Ad
              Auctions
      Trading Agent Design and Analysis Workshop 2011

  Kyriakos C. Chatzidimitriou              AUTH/CERTH
  Lampros C. Stavrogiannis        Univ. of Southampton
  Andreas L. Symeonidis                    AUTH/CERTH
  Pericles A. Mitkas                       AUTH/CERTH
Introduction
• Basic idea: working paper of Dr. Yevgeniy
  Vorobeychik regarding QuakTAC 2009 entry
• Since this initial work, we have:
      –   Conducted more Game-Theoretic experiments
      –   Improved conversions estimation
      –   Improved user distribution estimation
      –   Included an adaptive component
• Ended up with (more or less) the same:
        “Ultimate Answer to the Ultimate Question of Life, The
          Universe, and Everything” TAC Ad Auctions Game

                                0.3
TADA@IJCAI 2011                Mertacor                          2
Basic Strategy: VPC
                                                            D
                                                    q                   q
                                         bid    d 1
                                                                a vd        1




                        ^                                                                                A
                  q         q                                                    q
              ˆ
              v        Pr { conversion      | click } E [ revenue                    | conversion    ]




 ^                                              ^
                                                                                                                      C
                                                                                          | focused }( Iˆ d
     q                                                              q       q
Pr { conversion        | click }   focusedPer           centage         Pr { conversion                       1
                                                                                                                  )
                                            B


     TADA@IJCAI 2011                                     Mertacor                                                 3
A) Expected Revenue
• Solely depends on Manufacturer’s Specialty
  (MS)

        (USP       (3    MSB )) / 3    MS not defined   in q
       USP        (1    MSB )          MS matched   in q
       USP                             MS not matched      in q




TADA@IJCAI 2011                   Mertacor                        4
B) Focused Percentage
• Monte Carlo Simulations
• First Method (Vorobeychik)
      – focusedPercentagequery = conversionsquery /
        [clicksquery * Pr(conversionquery )]
      – Average over query class (F0,F1,F2)
• Second Method                           2011
      – Use server source files
      – MC states (NS, IS, F0, F1, F2, T) per product (x9)
      – focusedPercentagequery = Fiquery / (Fiquery + ISquery)

TADA@IJCAI 2011                Mertacor                          5
Graph for query (pg, null)




TADA@IJCAI 2011              Mertacor          6
C) Id Estimation
                                                                    cap
       Id    1
                  g (cd   3
                              cd   2
                                        cd    1
                                                  ˆ
                                                  cd   ˆ
                                                       cd   1
                                                                C         )

• kNN
      – Inspired by periodic conversions behavior
      – Time series matching using Euclidean Distance as a
        similarity criterion
      – k = 5, t = 5, N = 600
• Heuristic Baseline
      – Underestimate for bidding higher cd = (cd-1 +cd-2 +cd-3 )/4
• Aggregate
      – cd          =         (kNN+Baseline)/2
      – cd+1        =         ((kNN+Baseline)/2)/2
TADA@IJCAI 2011                        Mertacor                               7
kNN example




TADA@IJCAI 2011       Mertacor   8
No ad                      No
                           display                conversions


                         Cyclic behavior                           High
               Low bid                                          conversion
                                                                  prob.
                  • 5-day long pulses
                  • Pulse Height & Width related to
                    factors like user distribution at the
         Low
                    time, competition                                   High
         VPC      • Large peaks in daily profits come from              VPC
                    “catching the wave”


               Low
            conversion                                           High bid
              prob.

                                                      High
                         Conversions
TADA@IJCAI 2011                        Mertacor     ranking                    9
Rest of the strategy
• Budget unconstrained
• Hard-coded ad selection strategy
      – F0 => generic
      – F2 => if user preference matched => targeted
      – F1 => if one of the preferences is matched =>
        targeted, else generic




TADA@IJCAI 2011             Mertacor                    10
Simulation-based Game
                    Theoretical Analysis
• One-shot Bayesian game
• Myopic linear strategies b = α ∙ vpc -> find
  optimal shading, α
• Iterative best response to find a symmetric
  Bayes-Nash equilibrium
• Most profitable single deviation from a
  homogeneous set of opponents until self-play
  is best response -> BNE

TADA@IJCAI 2011            Mertacor          11
D) alpha
• Vorobeychik
      – “a = 0.2, 0.3 more robust to aggressive
        opponents”
      – The previous best values found a=0.1, 0.2 (2009)
        not profitable in 2010 platform
• We have re-run the algorithm under the 2010
  specs
      – a=0.3 is the optimal value (1 -> 0.4 -> 0.3)


TADA@IJCAI 2011               Mertacor                     12
Simulation-based Game
                    Theoretical Analysis
• Instead of α -> (αF0 ,αF1, αF2) x (αCLOW, αCMED,αCHIGH)

• Start from optimal α = 0.3, explore all possible
  deviations for each α, first for query levels then
  capacity levels

• 0.3 seems to be optimal in all cases

• Points in between do not yield different results (0.3
  still the best)
TADA@IJCAI 2011            Mertacor                         13
Simulation-based Game
                    Theoretical Analysis




TADA@IJCAI 2011            Mertacor        14
Simulation-based Game
                    Theoretical Analysis




TADA@IJCAI 2011            Mertacor        15
Adaptive component
• Problem Statement
       We want to capture the case where, based on the
        current environment (competition conditions),
      having a different α than 0.3, will yield a competitive
                            advantage
• GT analysis “a good starting point”
• Model it as an associative k-armed bandit
  problem with optimistic initial values and e-
  greedy action selection strategy
TADA@IJCAI 2011              Mertacor                      16
State, Action, Reward
• State
      – Quantized VPC (x11)
      – Capacity (x3)
      – Query Type (x3)
      – Manufacturer Specialty Bonus (x2)
      – Component Specialty Bonus (x2)
• a = {0.28, 0.29, 0.3, 0.31, 0.32}
• r = daily profits

TADA@IJCAI 2011            Mertacor         17
Experiment (1/2)
• Self-play                      Agent Name        Score
   – 210 games                   Mertacor-Std-1    53.042
   – All capacities to 450       Mertacor-Std-2    52.763
     (MEDIUM)                    Mertacor-kNN-1    52.673
• The standard agent is          Mertacor-kNN-2    52.703
  unbeatable since it is created Mertacor-RL-1     52.270
  that way                       Mertacor-RL-2     52.233
                                 Mertacor-Full-1   51.673
                                 Mertacor-Full-2   51.899


TADA@IJCAI 2011           Mertacor                      18
Experiment (2/2)
• Mix-up things, include more               Agent Name         Score
  agents with different strategies          Mertacor-kNN       53.223
      – 250 games
      – All capacities to 450 (MEDIUM)
                                            Mertacor-Std       52.245
                                            Schlemazl (2010)   51.975
• Better estimation lead to
  better performance                        Mertacor-Full      51.796
                                            Mertacor-RL        51.790
• Adaptiveness is suited for
  even more complicated                     Epflagent (2010)   49.232
  environments                              Tau (2010)         45.987
  (capacity and strategy wise)              Crocodile (2010)   45.858


TADA@IJCAI 2011                  Mertacor                         19
2011
     Also tested/under development
• Daily Campaign Budget Threshold algorithms
      – Estimation
      – Simulation
• Particle Filtering for user state estimation
      – TacTex




TADA@IJCAI 2011         Mertacor                  20
Conclusions & Future Work
• α = 0.3 is a very powerful conclusion/hard to
  beat
• Better estimates for B) user state and C) Id
  could further improve performance
• On-line learning still in very crude form – Not
  yet satisfied but seems a reasonable thing to
  do
• Competition-wise: fitted-Q learning from data
  logs
TADA@IJCAI 2011        Mertacor                 21
Thank you for your attention

         Questions?

An Adaptive Proportional Value-per-Click Agent for Bidding in Ad Auctions

  • 1.
    An Adaptive ProportionalValue-per- Click Agent for Bidding in Ad Auctions Trading Agent Design and Analysis Workshop 2011 Kyriakos C. Chatzidimitriou AUTH/CERTH Lampros C. Stavrogiannis Univ. of Southampton Andreas L. Symeonidis AUTH/CERTH Pericles A. Mitkas AUTH/CERTH
  • 2.
    Introduction • Basic idea:working paper of Dr. Yevgeniy Vorobeychik regarding QuakTAC 2009 entry • Since this initial work, we have: – Conducted more Game-Theoretic experiments – Improved conversions estimation – Improved user distribution estimation – Included an adaptive component • Ended up with (more or less) the same: “Ultimate Answer to the Ultimate Question of Life, The Universe, and Everything” TAC Ad Auctions Game 0.3 TADA@IJCAI 2011 Mertacor 2
  • 3.
    Basic Strategy: VPC D q q bid d 1 a vd 1 ^ A q q q ˆ v Pr { conversion | click } E [ revenue | conversion ] ^ ^ C | focused }( Iˆ d q q q Pr { conversion | click } focusedPer centage Pr { conversion 1 ) B TADA@IJCAI 2011 Mertacor 3
  • 4.
    A) Expected Revenue •Solely depends on Manufacturer’s Specialty (MS) (USP (3 MSB )) / 3 MS not defined in q USP (1 MSB ) MS matched in q USP MS not matched in q TADA@IJCAI 2011 Mertacor 4
  • 5.
    B) Focused Percentage •Monte Carlo Simulations • First Method (Vorobeychik) – focusedPercentagequery = conversionsquery / [clicksquery * Pr(conversionquery )] – Average over query class (F0,F1,F2) • Second Method 2011 – Use server source files – MC states (NS, IS, F0, F1, F2, T) per product (x9) – focusedPercentagequery = Fiquery / (Fiquery + ISquery) TADA@IJCAI 2011 Mertacor 5
  • 6.
    Graph for query(pg, null) TADA@IJCAI 2011 Mertacor 6
  • 7.
    C) Id Estimation cap Id 1 g (cd 3 cd 2 cd 1 ˆ cd ˆ cd 1 C ) • kNN – Inspired by periodic conversions behavior – Time series matching using Euclidean Distance as a similarity criterion – k = 5, t = 5, N = 600 • Heuristic Baseline – Underestimate for bidding higher cd = (cd-1 +cd-2 +cd-3 )/4 • Aggregate – cd = (kNN+Baseline)/2 – cd+1 = ((kNN+Baseline)/2)/2 TADA@IJCAI 2011 Mertacor 7
  • 8.
  • 9.
    No ad No display conversions Cyclic behavior High Low bid conversion prob. • 5-day long pulses • Pulse Height & Width related to factors like user distribution at the Low time, competition High VPC • Large peaks in daily profits come from VPC “catching the wave” Low conversion High bid prob. High Conversions TADA@IJCAI 2011 Mertacor ranking 9
  • 10.
    Rest of thestrategy • Budget unconstrained • Hard-coded ad selection strategy – F0 => generic – F2 => if user preference matched => targeted – F1 => if one of the preferences is matched => targeted, else generic TADA@IJCAI 2011 Mertacor 10
  • 11.
    Simulation-based Game Theoretical Analysis • One-shot Bayesian game • Myopic linear strategies b = α ∙ vpc -> find optimal shading, α • Iterative best response to find a symmetric Bayes-Nash equilibrium • Most profitable single deviation from a homogeneous set of opponents until self-play is best response -> BNE TADA@IJCAI 2011 Mertacor 11
  • 12.
    D) alpha • Vorobeychik – “a = 0.2, 0.3 more robust to aggressive opponents” – The previous best values found a=0.1, 0.2 (2009) not profitable in 2010 platform • We have re-run the algorithm under the 2010 specs – a=0.3 is the optimal value (1 -> 0.4 -> 0.3) TADA@IJCAI 2011 Mertacor 12
  • 13.
    Simulation-based Game Theoretical Analysis • Instead of α -> (αF0 ,αF1, αF2) x (αCLOW, αCMED,αCHIGH) • Start from optimal α = 0.3, explore all possible deviations for each α, first for query levels then capacity levels • 0.3 seems to be optimal in all cases • Points in between do not yield different results (0.3 still the best) TADA@IJCAI 2011 Mertacor 13
  • 14.
    Simulation-based Game Theoretical Analysis TADA@IJCAI 2011 Mertacor 14
  • 15.
    Simulation-based Game Theoretical Analysis TADA@IJCAI 2011 Mertacor 15
  • 16.
    Adaptive component • ProblemStatement We want to capture the case where, based on the current environment (competition conditions), having a different α than 0.3, will yield a competitive advantage • GT analysis “a good starting point” • Model it as an associative k-armed bandit problem with optimistic initial values and e- greedy action selection strategy TADA@IJCAI 2011 Mertacor 16
  • 17.
    State, Action, Reward •State – Quantized VPC (x11) – Capacity (x3) – Query Type (x3) – Manufacturer Specialty Bonus (x2) – Component Specialty Bonus (x2) • a = {0.28, 0.29, 0.3, 0.31, 0.32} • r = daily profits TADA@IJCAI 2011 Mertacor 17
  • 18.
    Experiment (1/2) • Self-play Agent Name Score – 210 games Mertacor-Std-1 53.042 – All capacities to 450 Mertacor-Std-2 52.763 (MEDIUM) Mertacor-kNN-1 52.673 • The standard agent is Mertacor-kNN-2 52.703 unbeatable since it is created Mertacor-RL-1 52.270 that way Mertacor-RL-2 52.233 Mertacor-Full-1 51.673 Mertacor-Full-2 51.899 TADA@IJCAI 2011 Mertacor 18
  • 19.
    Experiment (2/2) • Mix-upthings, include more Agent Name Score agents with different strategies Mertacor-kNN 53.223 – 250 games – All capacities to 450 (MEDIUM) Mertacor-Std 52.245 Schlemazl (2010) 51.975 • Better estimation lead to better performance Mertacor-Full 51.796 Mertacor-RL 51.790 • Adaptiveness is suited for even more complicated Epflagent (2010) 49.232 environments Tau (2010) 45.987 (capacity and strategy wise) Crocodile (2010) 45.858 TADA@IJCAI 2011 Mertacor 19
  • 20.
    2011 Also tested/under development • Daily Campaign Budget Threshold algorithms – Estimation – Simulation • Particle Filtering for user state estimation – TacTex TADA@IJCAI 2011 Mertacor 20
  • 21.
    Conclusions & FutureWork • α = 0.3 is a very powerful conclusion/hard to beat • Better estimates for B) user state and C) Id could further improve performance • On-line learning still in very crude form – Not yet satisfied but seems a reasonable thing to do • Competition-wise: fitted-Q learning from data logs TADA@IJCAI 2011 Mertacor 21
  • 22.
    Thank you foryour attention Questions?