The Importance of Experimentation in Product Development

Pan Wu

Senior Data Science Manager at Meta

48,638 followers 1y

Product development entails inherent risks where hasty decisions can lead to losses, while overly cautious changes may result in missed opportunities. To manage these risks, proposed changes undergo randomized experiments, guiding informed product decisions. This article, written by Data Scientists from Spotify, outlines the team’s decision-making process and discusses how results from multiple metrics in A/B tests can inform cohesive product decisions. A few key insights include: - Defining key metrics: It is crucial to establish success, guardrail, deterioration, and quality metrics tailored to the product. Each type serves a distinct purpose—whether to enhance, ensure non-deterioration, or validate experiment quality—playing a pivotal role in decision-making. - Setting explicit rules: Clear guidelines mapping test outcomes to product decisions are essential to mitigate metric conflicts. Given metrics may show desired movements in different directions, establishing rules beforehand prevents subjective interpretations during scientific hypothesis testing. - Handling technical considerations: Experiments involving multiple metrics raise concerns about false positive corrections. The team advises applying multiple testing corrections for success metrics but emphasizes that this isn't necessary for guardrail metrics. This approach ensures the treatment remains significantly non-inferior to the control across all guardrail metrics. Additionally, the team proposes comprehensive guidelines for decision-making, incorporating advanced statistical concepts. This resource is invaluable for anyone conducting experiments, particularly those dealing with multiple metrics. #datascience #experimentation #analytics #decisionmaking #metrics – – – Check out the "Snacks Weekly on Data Science" podcast and subscribe, where I explain in more detail the concepts discussed in this and future posts: -- Spotify: https://lnkd.in/gKgaMvbh -- Apple Podcast: https://lnkd.in/gj6aPBBY -- Youtube: https://lnkd.in/gcwPeBmR https://lnkd.in/gewaB9qC

Risk-Aware Product Decisions in A/B Tests with Multiple Metrics | Spotify Engineering engineering.atspotify.com

1 Comment

Yuzheng Sun

Build with AI | Superlinear.Academy | 课代表立正

33,044 followers 5mo

Experimentation data scientists often have to remind everyone these two principles over and over again: 1. 𝐃𝐨𝐧’𝐭 𝐜𝐨𝐧𝐟𝐮𝐬𝐞 𝐚 𝐠𝐨𝐨𝐝 𝐞𝐱𝐩𝐞𝐫𝐢𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧 𝐩𝐫𝐨𝐠𝐫𝐚𝐦 𝐰𝐢𝐭𝐡 𝐠𝐨𝐨𝐝 𝐢𝐝𝐞𝐚𝐬 The value of experiments comes from surprises. If you expected a 10% lift, find a 10% lift, and ship it. The value came from the idea, not the experiment. The test confirmed what you already believed, so the experiment itself added zero value. But if you expect a 10% lift and see a 10% drop instead, then decide not to ship — now that test saved you. It prevented a 10% hit, and gave you valuable learnings. Ron Kohavi research shows that in most companies, only 8% to 33% of experiments succeed. That means most ideas don’t work out. Knowing which ones fail is hugely valuable. 𝘋𝘰𝘯’𝘵 𝘵𝘳𝘺 𝘵𝘰 𝘨𝘶𝘦𝘴𝘴 𝘸𝘩𝘢𝘵’𝘴 𝘸𝘰𝘳𝘵𝘩 𝘵𝘦𝘴𝘵𝘪𝘯𝘨. 𝘛𝘦𝘴𝘵 𝘦𝘷𝘦𝘳𝘺𝘵𝘩𝘪𝘯𝘨. 2. 𝐃𝐨𝐧’𝐭 𝐜𝐨𝐧𝐟𝐮𝐬𝐞 𝐭𝐡𝐞 𝐫𝐨𝐥𝐞 𝐨𝐟 𝐞𝐱𝐩𝐞𝐫𝐢𝐦𝐞𝐧𝐭 𝐰𝐢𝐭𝐡 𝐭𝐡𝐞 𝐫𝐨𝐥𝐞 𝐨𝐟 𝐝𝐚𝐭𝐚 𝐬𝐜𝐢𝐞𝐧𝐭𝐢𝐬𝐭 People often want to look at an A/B test result and jump straight to a decision based on whether results are stat-sig. That’s a mistake. Modern products are complex. It's rare that we can just compare two numbers and ship the larger one. The best article is from Tom Cunningham, where he summarized the role of data scientists as the interpretation problem and the extrapolation problem: Based on the observed effect from the experiment - what's the true effect to the business? And - what's our best guess of an effect on some unobservable variables -- such as long term revenue, or user trust? The experiment’s job is different. It creates the conditions for additional measurement through logging. It randomizes users so we can trust causality. And it gives us a clear read on what actually happened. That’s it. Don’t mix up the roles. And don't shoot the data scientist who try to tell you these two principles.

3 Comments

Ryan Lucht

Build your Experimentation Advantage | Senior Advocate @ Datadog

5,071 followers 8mo

Set aside 20% of your experimentation budget (slash time) and let your team run wild. Forget about the prioritized roadmap for a bit. Make research-backed hypotheses totally optional. Just purely self-directed exploration. One engineer tinkering with an idea that nobody else thought was worthwhile (lengthening ad headlines) led to the "best revenue-generating idea in Bing's history" - worth well over nine figures. (per the Kohavi et al book) But beyond the reality of how often we are surprised by our experiment outcomes, this is a powerful way to get teams more interested in running experiments. Autonomy is a powerful driver of motivation. And intrinsic motivation like a feeling of autonomy can be far more powerful than offering some "carrots" of promised career growth or "sticks" via making experimentation mandatory. The Post-It Note was invented because 3M gave engineers 15% of their time to work on whatever side projects they'd like. Google's version was "20 Percent Time", which led to AdSense, Google News, search autocompletes, and (according to some) Gmail. The reality of these often-cited programs is a little hazier than the simple headlines suggest... (I feel like I can sense some ex-Googler ready to comment and say "nobody actually did this" 😂) But the principles are sound: 1) Use autonomy as fuel to get people experimenting 2) Make progress by tinkering Nassim Taleb in "Skin in the Game": "The knowledge we get by tinkering, via trial and error... is vastly superior to that obtained through reasoning" How much space have you made for your teams to freely explore and experiment?

8 Comments

Steve Powers

Senior Director of Data Science at Moloco | Machine Learning, Online Marketing

3,178 followers 5mo

Hi - I'm Steve. I am a professional fail-er. Many times data teams are asked questions that pertain to things that the business has never done before. This might be creating opportunity sizing for a new feature, forecasting adoption or performance of our customers, or building recommendations, either for the business or the customers to suggest relevant improvements or features to adopt. The challenge with many of these problems is that there's not always a black-or-white answer, and in addition, we tend not to have complete datasets that enable us to paint the full picture. As a result, we end up having to build assumptions into our models, basing this off of past experience, similar features, user behavior and other correlational analysis. Data teams that are not comfortable with the concept of failing fast can fall into the pitfall of 'paralysis by analysis', whereby we fail to make a recommendation due to the uncertainty that implicitly exists in the data. The easiest thing to do to delay a project or deliverable is to ask for more data, which inevitably will beget more questions and sometimes cause us to lose sight of the goal we were trying to accomplish in the first place. A much more effective approach, I have found, is to clearly draw out what assumptions we must make to size the feature, or conduct the analysis. Establish clearly the risk of us being wrong on any of those assumptions, and clearly evaluate one-way (irreversible) and two-way (reversible) decisions. The goal is to have enough 'low stakes' experiments, where we can easily roll back the change to gain enough confidence in the assumptions you must make for the 'high stakes' decisions where reversing the change is either incredibly costly, or sometimes infeasible. Through this approach, we're able to dedicate a lion share of the analysis time firming up the hypothesis we must make for the 'high risk' decisions, and apply the highest level of rigor in terms of experimentation and burden of evidence. 'Low risk' areas enable us to broaden our scope of knowledge of the product, building confidence in our assumptions, and creating data for us to explore 'why wasn't my assumption accurate?' Creating controlled environments to fail fast will not only enable you to learn faster, but it will enable teams to build confidence in their abilities to test their assumptions and debug when the stakes are high. If you create an environment where *every* decision requires an insurmountable burden of evidence, you run the risk of stifling innovation and having a data team that's not equipped to debug situations when our assumptions were inevitably wrong. My suggestion to data teams is to embrace (controlled) failure. No one asks the question 'why did this roll-out go so well?', but certainly the question always arises 'what went wrong' when our predictions do not materialize. Ensure you're prepared for those situations by learning *how* to fail.

9 Comments

Ashish Bhatia

AI Product Leader | GenAI Agent Platforms | Evaluation Frameworks | Responsible AI Adoption | Ex-Microsoft, Nokia

16,133 followers 6mo

𝐁𝐮𝐢𝐥𝐝𝐢𝐧𝐠 𝐜𝐮𝐥𝐭𝐮𝐫𝐞 𝐨𝐟 𝐝𝐚𝐭𝐚 & 𝐞𝐱𝐩𝐞𝐫𝐢𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧 As a Product Managers, we often operate in a world of ambiguity—where clarity might not exist upfront and decisions need to be made before all the data is available. In this environment, having a strong point of view is not just helpful, it’s essential. I always say that PM need to be opinionated and operate in an environment of ambiguity but equally important is the discipline to evolve that perspective as real signals come in. I’ve come to see product development as a continuous loop of belief, validation, and adaptation. At the earliest stage of a feature or an idea, when things are still abstract, I lean towards research (AI a great companion tool), deep customer conversations and focused usability studies & design validations. These are small in scale but incredibly high in signal. They help us identify sharp edges, product friction, and business value creation, these insights don’t show up on dashboards at this early stage but shape everything downstream. As we move toward broader public release, the feedback loop changes. Now it’s about pattern recognition—experimentation at scale. A/B testing and telemetry become my tools of operation. We can infer from telemetry data how customer use our product, not just what they say. The decisions become more about optimization, validation, understanding impact and fine-tuning the rough edges. Once something is live, our responsibility doesn’t end. In fact, it shifts. At scale, telemetry tells us where the system is healthy and where are the product cliffs. Feedback from customers using the product to create business value and their pain-points restarts the process of new improvements. This is my worldview: It’s not about getting it right the first time—it’s about building the muscle to learn fast and course-correct with confidence. What's in your toolbox? #productmanagers #dataculture #experimentation

2 Comments

Aakash Gupta

The AI PM Guy 🚀 | Helping you land your next job + succeed in your career

284,777 followers 7mo

At Microsoft, 67% of experiments fail. At Google, Bing and Netflix, >80% fail. But that doesn't mean experiment less. Actually, experiment more. It was fascinating to learn this data from Ron Kohavi. Here’s why failure is actually powerful: — PART ONE - Surprising Things Can work At Bing, a simple text tweak to bring the first line of the description into the ad title sat in the backlog for months. No one thought it mattered. But eventually, they tested it. The result? $100M+ in annual revenue. Without experimentation, that idea never would’ve seen the light of day. That is what testing gives you: Real insight. Not opinion. Not intuition. Not gut feel. So why don’t most teams operate this way? — PART TWO - "Gut Feeling" Is The Lowest Form of Evidence This is the hierarchy of decision-making quality (best to worst): - Meta-analyses of experiments - Randomized Controlled Experiments (A/B tests) - Non-randomized controlled tests - Observational studies - Case studies, anecdotes, expert opinions (HiPPOs) Now, ask yourself: Where does most roadmap prioritization happen? Right at the bottom. If you want to build smarter, you need to move up the pyramid. That starts by testing more. And embracing failure as part of the process. — PART THREE - Experimentation IS Learning The only way you can actually learn what's working with certainty is experimentation. While it's okay to rely on baselines for QBRs... It's not going to give you confidence in a product intervention. There's always going to be confonounding variables. If you want confidence in what's working, you need experiments. Or, ideally, meta analyses across many experiments. — If you want to master experimentation and build a system where every test teaches you something… Go here for the full deep dive: https://lnkd.in/ea8sWSsS

24 Comments

Ujwal Kalra

CEO, ARPflow - AI for deduction mgmt. | BCG | Author

18,378 followers 6mo

Google famously tested 41 shades of blue on its homepage, discovering one that boosted revenue by $200 million. Facebook runs thousands of tests simultaneously—tweaking algorithms, interfaces, and ads daily. Airbnb tested professional photography in listings, instantly doubling bookings. This month's Townhall was just about #Experimentation at NAKAD! Experiments aren’t just good practice—they’re survival. They replace assumptions with data, rapidly uncovering customer truths. As Anu Harihaharan, former Partner at YC Continuity Fund, explained to me for my book Startup Compass: “Speed of experimentation beats the quality of ideas every single time.” Why speed? Because startups rarely fail due to a lack of good ideas—they fail by burning resources chasing untested assumptions. Fast experiments validate ideas quickly, cutting the cost of failure and accelerating growth. Uber’s surge pricing was an experiment that fundamentally changed transportation pricing dynamics. In startup journeys, the currency isn’t ideas—it’s insights, and experiments generate these insights at scale. The faster your experimentation cycle, the quicker you iterate towards product-market fit. #Startups #Experimentation #Growth #StartupCompass

Rishabh Jain

Co-Founder / CEO at FERMÀT - the leading commerce experience platform

13,365 followers 1y

It's Wednesday, you know what that means: Whiteboard Wednesday. In today's episode: Experimentation is NOT testing. Let's get into it. (Bookmark for later if you need) Here's the matrix framework I use to drive impactful experimentation. On the X-axis, we have risk—are you open to a high-risk change or want high-certainty? On the Y-axis, we have impact—are you targeting marginal improvement or 10x impact? On the right side of the matrix, we have high certainty. Now here's the kicker. Most companies think A/B testing is the holy grail. That's where the buck stops with testing. The truth is, A/B testing lives on the low end of the impact spectrum where you're optimizing for high certainty. And there are scenarios where that's great—it's the perfect approach for optimizing established processes and proving small wins. But it's only a piece of the puzzle. On the other end of the impact spectrum, if you want to 10x your business with high certainty, that's going to be high-cost experimentation. While appealing, these opportunities often come with a hefty price tag, like acquiring a successful business to boost revenue. As for the left side of the matrix (high-risk), the high-risk, 10x impact quadrant is what I call your "moonshot." This is where great experimental design can shine. It's where you use customer acquisition costs or funds to test a new concept. Finally, the only type of experimentation I would truly NOT recommend to anyone: Bad bets. A common example is a full site redesign with limited upside but significant (potentially 100%) downside risk. TL;DR: I'm not saying A/B testing is bad or to abandon it completely. But let's not equate testing with experimentation. Testing is low risk and low reward. Experimentation is about innovation—and that requires either more risk or more investment. ps: thanks Cherene Aubert for the recent posts that inspired me to share

12 Comments

Jon MacDonald

Turning user insights into revenue for top brands like Adobe, Nike, The Economist | Founder, The Good | Author & Speaker | thegood.com | jonmacdonald.com

15,225 followers 10mo

Imagine a world where your SaaS team could make data-driven decisions in days, not weeks. Welcome to the game-changing power of rapid experimentation. Think of rapid testing as your team's secret weapon. It delivers results in days, not weeks. This speed isn't just about efficiency. It's about staying ahead in a competitive market. But speed isn't the only benefit. Rapid testing cuts through the politics of A/B testing. No more debates about which tests to run. The data from rapid tests informs those decisions. It's also a cost-effective way to narrow down ideas. When you're swimming in possibilities, rapid testing helps you find the best ones quickly. For teams on tight budgets, this approach is a lifesaver. Smaller sample sizes mean lower costs. But don't mistake speed for superficiality. Rapid tests can uncover deep insights. They reveal usability issues before they become expensive problems. They provide the 'why' behind user behavior. This qualitative depth is invaluable. It informs better solutions and more user-centric designs. Ultimately, rapid experimentation is about de-risking decisions. It's about testing early and often. And, that ensures that when you do release a feature or product, it truly meets user needs. In the fast-paced world of SaaS, rapid experimentation isn't just useful. It's essential. It's how smart teams stay agile, informed, and ahead of the curve.

34 Comments

Bryan Zmijewski

Started and run ZURB → 2,500+ teams stopped guessing • Decisive design starts with fast user signals

12,149 followers 1y

Design is the test. Something interesting starts to happen in a continuous discovery process. Imagine having 100 users test a concept feature, discover they dislike it, and then quickly try different multivariate variations to find a better solution, all in just a few hours. At the same time, you explore new opportunities with another sample of 100 users to generate new ideas in parallel. The line between making and researching disappears. Design evolves from a tool used to resolve internal debate among product and engineering teams to a proactive means of gaining insights from a targeted audience. This shift implies that creating concepts, visuals, user flows, text, or animations starts with designing tests and surveys rather than envisioning the final product for internal consumption. Consequently, the visualization of the product becomes a secondary outcome of the experimentation and learning journey. Although the development process may begin with a PRD or by collecting requirements, the product's evolution is driven by discussions about user behaviors and responses. What does this future look like? → Design shifts from finalizing products to learning through user tests, emphasizing quick iteration from feedback. → User feedback guides rapid changes, with simultaneous testing groups for problem-solving and idea generation. → Creating and researching merge into one continuous process, removing traditional boundaries. → The role of design transforms from internal conflict resolution to actively seeking user insights, starting with tests over product visions. → Visualization of products becomes secondary, focusing on learning from user interactions before final designs. → Development is driven by user behavior and feedback, even if it begins with standard requirements. I use the words test/survey/discovery/research/learning liberally, knowing that methods and techniques vary, but the central component is user engagement. That’s how we consider it in Helio. What does the future look like to you? #productdesign #productdiscovery #uxresearch #marketresearch

6 Comments

LinkedIn respects your privacy

The Importance of Experimentation in Product Development

Explore categories

The Importance of Experimentation in Product Development

More in Importance of Innovation

Explore categories