The Art of A/B Testing
Experimenting on Humans
Aviran Mordo
Head of Engineering
@aviranm
www.linkedin.com/in/aviran
www.aviransplace.com
Wix In Numbers
Over 100M users + 1.5M new users/month
Static storage is >5Pb of data
3 data centers + 3 clouds (Google, Amazon,Azure)
5B HTTP requests/day
1500 people work atWix, of which ~ 600 in Engineering
BasicA/B testing
Experiment driven development
PETRI –Wix’s 3rd generation open source experiment system
Challenges and best practices
Complexities and effect on product
Agenda
16:15
A/B Test
Home page results
(How many registered)
This is the Wix editor
Gallery manager
What can we improve?
Is this better?
Product ExperimentsToggles &
Reporting Infrastructure
How do you know what is running?
If I “know” it is better, do I really need
to test it?
Why so many?
Sign-up
Choose
Template
Edit site Publish Premium
The theory
EVERY new feature is A/B tested
We open the new feature to a % of users
 Measure success
 If it is better, we keep it
 If worse, we check why and improve
If flawed, the impact is just for % of our users
Conclusion
New code can have bugs
Conversion can drop
Usage can drop
Unexpected cross test dependencies
Sh*t happens (Test could fail)
Language
GEO
Browser
User-agent
OS
Minimize affected users (in case of failure)
Gradual exposure (percentage of…)
Company employees
User roles
Any other criteria you have (extendable)
All users
First time visitors = Never visited wix.com
New registered users = Untainted users
Existing registered users = Already familiar with the service
Not all users are equal
Start new experiment (limited population)
Calling Laboratory is Easy
Adding a mobile view
First trial failed
Performance had to be improved
Halting the test results in loss of data.
What can we do about it?
Solution – Pause the experiment!
• Maintain NEW experience for already exposed users
• No additional users will be exposed to the NEW feature
PETRI’s pause implementation
Use cookies to persist assignment
If user changes browser assignment is unknown
Server side persistence solves this
You pay in performance & scalability
Decision (What to do with the data)
Keep feature Drop feature
Improve code &
resume experiment
Keep backwards compatibility for exposed
users forever?
Migrate users to another equivalent feature
Drop it all together (users lose data/work)
Numbers look good but sample size is small
We need more data!
Expand
Reaching statistical significance
25% 50% 75% 100%
75% 50% 25% 0%Control Group (A)
Test Group (B)
Keep user experience consistent
Control
Group
(A)
Test
Group
(B)
Signed-in user
 Test group is determined by the user ID
 Guarantee toss consistency across browsers
Anonymous user (Home page)
 Test group is randomly determined
 Cannot guarantee consistent experience cross browsers
11% ofWix users use more than one desktop browser
Keeping consistent UX
Always exclude robots
Don’t let Google index a losing page
Don’t let bots affect statistics
# of active experiment Possible # of states
10 1024
20 1,048,576
30 1,073,741,824
Possible states >= 2^(# experiments)
Wix has ~1000 active experiments
~1.071509e+301
Supporting 2^N different users is challenging
How do you know which experiment causes errors?
Managing an ever changing production env.
Near real time user BI tools
Override options (URL parameters, cookies, headers…)
Specialized tools
Integrated into the product
Share document with other users
Document owner is part of a test that
enables a new video component
What will the other user experience
when editing a shared document ?
Owner Friend
Assignment may be different than owner’s
Owner (B) Friend (A)
Enable features by existing content
 What will happened when you remove a component
Enable features by document owner’s assignment
 The friend now expects to find the new feature on his own docs
Exclude experimental features from shared documents
 You are not really testing the entire system
Possible solutions
Petri is more than just an A/B test framework
Feature toggle
A/B Test
Personalization
Internal testing
Continuous
deployment
Jira integration
Experiments
Dynamic
configuration
QA
Automated
testing
Petri is an open source project
https://github.com/wix/petri
Q&A
https://github.com/wix/petri
http://goo.gl/dqyely
Aviran Mordo
Head of Engineering
@aviranm
www.linkedin.com/in/aviran
www.aviransplace.com
Credits
http://upload.wikimedia.org/wikipedia/commons/b/b2/Fiber_optics_testing.jpg
http://goo.gl/nEiepT
https://www.flickr.com/photos/ilo_oli/2421536836
https://www.flickr.com/photos/dexxus/5791228117
http://goo.gl/SdeJ0o
https://www.flickr.com/photos/112923805@N05/15005456062
https://www.flickr.com/photos/wiertz/8537791164
https://www.flickr.com/photos/laenulfean/5943132296
https://www.flickr.com/photos/torek/3470257377
https://www.flickr.com/photos/i5design/5393934753
https://www.flickr.com/photos/argonavigo/5320119828
Modeled experiment lifecycle
Open source (developed usingTDD from day 1)
Running at scale on production
No deployment necessary
Both back-end and front-end experiment
Flexible architecture
Why Petri
PERTI Server Your app
Laboratory
DB Logs

The Art of A/B Testing

Editor's Notes

  • #5 Who here does A/B tests? Who plans to do A/B test?
  • #6 A/B test is embedded in our development process Petri is based on our experience and lessons we learned
  • #7 You divide your users into group and measure which reached your goal
  • #8 What does it mean better? What is your goal?
  • #9 Measure conversion to register
  • #10 Not just comparing pages but also in development
  • #12 The theory – we can make a better gallery Our goal – make it easier for our users to build their sites (converting to premium)
  • #14 It is not about winning, its about not losing
  • #15 Lessons learned from 5 years of experience Petri allows PM to manage their tests
  • #16 A screenshot of the UI we built on top of PETRI
  • #18 PM added a Premium link in the editor
  • #19 If we shorten the funnel more users will reach the purchase page, thus increasing our sales PM added a Premium link in the editor
  • #20 Why did it fail. T-Shirt time
  • #21 Users were taken out of editing context before they were happy with the results
  • #23 Who thinks we should start with 50%
  • #24 Remember a test could fail
  • #28 Product manager defines a limited new experiment
  • #29 Product manager defines a limited new experiment
  • #30 We also test new must have features
  • #31 There is no A version. Control group just don’t get it.
  • #32  we need to improve before releasing to all users.
  • #33 Lose mobile view ? Unable to update ?
  • #34 Pause is a temporary state until system improves and resume test
  • #35 Server side state – performance vs correctness, cross-datacenters replicas
  • #38 The end result of every A/B test is reaching a decision. For this we need enough numbers. Add %, countries etc’
  • #39 As discussed in the pause scenario, here too we cannot take away the ‘new’ experience
  • #40 For anonymous users – this is the best we can do. This means sometimes (~11%) users will see different experiences.
  • #41 What would you expect the result should be for a bot? A? B? 2-nd T-Shirt time!
  • #45 Production is never in a ‘known’ state At least 2^ (more than 2 options)
  • #46 It is hard to know and we don’t always know exactly. Try to understand what was opened recently / recreate and eliminate
  • #47 Overrides also list of users.
  • #54 The obvious answer may be – allow the friend to edit the component if it’s already in the site But then – what if the friend deletes the component by mistake (or on purpose)? Then if he’s assigned to A he won’t be able to add it back. Possible solution – assign by site owner instead of by user (this means you must implement server side state) (why? Bcos you don’t know what lang/geo etc the site owner was when he got assigned – you only know his user id) Not perfect, user may experience something else on his own document
  • #56 Expose features internally to company employees Select assignment by sites (not only by users)
  • #60 santa
  • #62 Take screenshot from github!