How to set alpha when you have underpowered experiments?
Jakub Linowski asked how to assign trust-level to an experiment corpus when some experiments have low power (post).
If you’re designing a new experiment, I strongly recommend designing it with sufficient power (e.g., over 80% for an MDE that’s at most 5%).
What about experiments that you previously ran that do not meet the bar? In that case, I would lower the alpha (p-value threshold for calling something statistically significant) to control for the False Positive Risk (FPR), that is, the probability that the statistically significant result is a false positive.
The formula is not complicated, and we shared it in https://bit.ly/ABTestingIntuitionBusters:
where
Plugging in the above defaults, gives
Let look at a few examples:
These may be viewed as hard to achieve, but that’s the penalty you have to pay if you want to control the false positive risk. Note that with low powered experiments, the treatment effect is still expected to be exaggerated.
Also note that there are several additional factors to check to increase trust. These were discussed here.
Ecommerce Strategy, Growth & Behavioural Design | The Art of Ecommerce & The Choice Labs
1yWhat is the benefit of doing this calculation? If an experiment is underpowered and you don't want false positives, then you can detect almost nothing. I think it's easier than that: if you don't have enough power to run an experiment, don't run an experiment. Redesign the experiment or do some other test. Tweaking the numbers won't increase the power. Or will it?
Product Strategy & Data Science at Bitpanda
1yIs there a reference to this formula?
Staff Data Scientist | Gen-AI / LLM / ML-Ops | Ex-Amazon
1yThat's very informative Ron !
Principal Economist at Amazon
1yImplicit in this seems to be an assumption that bad launches (ie, launching when you shouldn’t) are much worse than missed opportunities (ie, not launching when you should). Would be interested to learn more about how you think about the relative costs of these mistakes.