The Art of A/B Testing

The Art of A/B Testing
Experimenting on Humans
Aviran Mordo
Head of Engineering
@aviranm
www.linkedin.com/in/aviran
www.aviransplace.com

Wix In Numbers
Over 100M users + 1.5M new users/month
Static storage is >5Pb of data
3 data centers + 3 clouds (Google, Amazon,Azure)
5B HTTP requests/day
1500 people work atWix, of which ~ 600 in Engineering

BasicA/B testing
Experiment driven development
PETRI –Wix’s 3rd generation open source experiment system
Challenges and best practices
Complexities and effect on product
Agenda

Home page results
(How many registered)

Gallery manager
What can we improve?

Product ExperimentsToggles &
Reporting Infrastructure

How do you know what is running?

If I “know” it is better, do I really need
to test it?
Why so many?

Sign-up
Choose
Template
Edit site Publish Premium
The theory

EVERY new feature is A/B tested
We open the new feature to a % of users
 Measure success
 If it is better, we keep it
 If worse, we check why and improve
If flawed, the impact is just for % of our users
Conclusion

New code can have bugs
Conversion can drop
Usage can drop
Unexpected cross test dependencies
Sh*t happens (Test could fail)

Language
GEO
Browser
User-agent
OS
Minimize affected users (in case of failure)
Gradual exposure (percentage of…)
Company employees
User roles
Any other criteria you have (extendable)
All users

First time visitors = Never visited wix.com
New registered users = Untainted users
Existing registered users = Already familiar with the service
Not all users are equal

Start new experiment (limited population)

First trial failed
Performance had to be improved

Halting the test results in loss of data.
What can we do about it?

Solution – Pause the experiment!
• Maintain NEW experience for already exposed users
• No additional users will be exposed to the NEW feature

PETRI’s pause implementation
Use cookies to persist assignment
If user changes browser assignment is unknown
Server side persistence solves this
You pay in performance & scalability

Decision (What to do with the data)
Keep feature Drop feature
Improve code &
resume experiment
Keep backwards compatibility for exposed
users forever?
Migrate users to another equivalent feature
Drop it all together (users lose data/work)

Numbers look good but sample size is small
We need more data!
Expand
Reaching statistical significance
25% 50% 75% 100%
75% 50% 25% 0%Control Group (A)
Test Group (B)

Keep user experience consistent
Control
Group
(A)
Test
Group
(B)

Signed-in user
 Test group is determined by the user ID
 Guarantee toss consistency across browsers
Anonymous user (Home page)
 Test group is randomly determined
 Cannot guarantee consistent experience cross browsers
11% ofWix users use more than one desktop browser
Keeping consistent UX

Always exclude robots
Don’t let Google index a losing page
Don’t let bots affect statistics

# of active experiment Possible # of states
10 1024
20 1,048,576
30 1,073,741,824
Possible states >= 2^(# experiments)
Wix has ~1000 active experiments
~1.071509e+301

Supporting 2^N different users is challenging
How do you know which experiment causes errors?
Managing an ever changing production env.

Near real time user BI tools
Override options (URL parameters, cookies, headers…)
Specialized tools

Share document with other users

Document owner is part of a test that
enables a new video component

What will the other user experience
when editing a shared document ?
Owner Friend

Assignment may be different than owner’s
Owner (B) Friend (A)

Enable features by existing content
 What will happened when you remove a component
Enable features by document owner’s assignment
 The friend now expects to find the new feature on his own docs
Exclude experimental features from shared documents
 You are not really testing the entire system
Possible solutions

Petri is more than just an A/B test framework
Feature toggle
A/B Test
Personalization
Internal testing
Continuous
deployment
Jira integration
Experiments
Dynamic
configuration
QA
Automated
testing

Petri is an open source project
https://github.com/wix/petri

Q&A
https://github.com/wix/petri
http://goo.gl/dqyely
Aviran Mordo
Head of Engineering
@aviranm
www.linkedin.com/in/aviran
www.aviransplace.com

Credits
http://upload.wikimedia.org/wikipedia/commons/b/b2/Fiber_optics_testing.jpg
http://goo.gl/nEiepT
https://www.flickr.com/photos/ilo_oli/2421536836
https://www.flickr.com/photos/dexxus/5791228117
http://goo.gl/SdeJ0o
https://www.flickr.com/photos/112923805@N05/15005456062
https://www.flickr.com/photos/wiertz/8537791164
https://www.flickr.com/photos/laenulfean/5943132296
https://www.flickr.com/photos/torek/3470257377
https://www.flickr.com/photos/i5design/5393934753
https://www.flickr.com/photos/argonavigo/5320119828

Modeled experiment lifecycle
Open source (developed usingTDD from day 1)
Running at scale on production
No deployment necessary
Both back-end and front-end experiment
Flexible architecture
Why Petri

PERTI Server Your app
Laboratory
DB Logs

The Art of A/B Testing

More Related Content

What's hot

Viewers also liked

Similar to The Art of A/B Testing

More from Aviran Mordo

Recently uploaded

The Art of A/B Testing

Editor's Notes