Goodhart’s Law with Examples

Goodhart’s Law with Examples

The British Economist Charles Goodhart is credited with the adage: “When a measure becomes a target, it ceases to be a good measure.” Originally, he stated “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”

Adam Gustafson asked for solutions (https://bit.ly/adamGustafsonGoodhartQuestion), and my answer is: be careful with correlational metrics and setup guardrails.

It helps to share examples, so here are three examples; the first mentioned in my class (https://bit.ly/ABClassRKLI) in the OEC session:

  1. Physical activity correlates with improved health, so health insurance companies are offering discounts if you exercise. The metric?  Phone step counters or Fitbits.  See https://www.npr.org/sections/health-shots/2018/11/19/668266197/as-insurers-offer-discounts-for-fitness-trackers-wearers-should-step-with-cautio. This is correlated well with exercise and is reasonable, until it becomes THE metric to optimize, and users buy phone swing cradles: https://www.facebook.com/watch/?v=845186309178629 . They’re $11.90 on Amazon https://www.amazon.com/dp/B08C7XG7G2
  2. Bing, as a search engine, is measured on query share, that is, the number of queries issued to Bing, and monetization.  Query shares seems like a reasonable metric to give execs as a target, until it wasn’t.  In 2007, Bing introduced Club Bing with games like Chicktionary (https://en.wikipedia.org/wiki/Club_Bing) where you had to form words from seven letters and earned “tickets” that had monetary value.   Every word you guessed ran a Bing search, which improved the query share metric dramatically, except for one problem: these were non-monetizable queries, as people were guessing words irrelevant for their daily life, so nobody was clicking on the ads.  The club was successful and query share improved dramatically.  Execs got their bonuses for beating the query share target metric, the ads team redefined their target to be monetization relative to “monetizable queries,” which excluded these games, and progress on all organizational metrics was great, except that these were watermelon metrics (green on the outside, rotten red on the inside).  The club was shutdown 5 years later.  The shutdown didn’t impact exec bonuses because by then the target metric changed to monetizable query share.  I was there at the tail end; this was real.  While there was some small value to claiming Bing was making query share progress, the real progress was much slower. 
  3. Programmer productivity is correlated with the number of lines of code written.  It’s a reasonable way to measure progress, but once it becomes a target, it’s easy to game.  Programmers were adding comments, breaking the comments to multiple lines, and repeating code instead of making snippets into reusable functions.  Most people agree that achieving the same features with less code is better: higher quality, less maintenance (e.g., https://blog.codinghorror.com/the-best-code-is-no-code-at-all/).

As you think of your organizational OEC, always think about the gameability of the metrics and setup the proper guardrail metrics to make sure you’re not fooling yourself with watermelon metrics. I’ve heard people say: we won’t do stupid things, but I’ve seen good organizations fool themselves too many times by chasing incentives tied to metrics. 

Shelley Griffel

Executive | CEO | Business Development | Global Marketing | Strategy | Entrepreneur | C-Level Trusted Advisor | Result Driven | Leading Opening of an International New Market to Generate Revenue

9mo

Ron, thanks for sharing! An excellent Israeli company that is gaining momentum in the United States at a dizzying pace https://bardagaragedoor.com/

Like
Reply
Nadav Eckstein

Analytics Team Lead at Natural Intelligence

10mo

I see this every time I go to McDonald’s. It seems they are now tracking the time between order placed and order ready. The problem is that since then my order was never really ready when I got the “your order is ready” text.

Wei Dai

M.Sc in Statistics

1y

Very informative

Yashwanth Musiboyina

Product Data Science Leader at Intuit | Growth, Machine Learning, Product Strategy

1y

Thanks for sharing. Another example I came across was setting targets against user behavior metrics (e.g. 1 month activation) that are correlated to longer term business outcome metrics (e.g.: 1 year retention), only to realize the correlation is weak when the former metric improves dramatically without much change in the latter. Setting up long term holdouts is one way to avoid these pitfalls. I’m curious if you found other methods to iterate on these metrics faster.

Great one! I wonder what was the prompt for the image to happen, haha

To view or add a comment, sign in

More articles by Ron Kohavi

Others also viewed

Explore content categories