For a moment, imagine this: you’ve consistently tracked the number of unique visitors coming to your website for a few months. Then, one morning, you head to your analytics dashboard and notice that traffic has doubled.
You didn’t launch any new marketing campaigns. Your affiliate partners haven’t done any promotions for your site. And, come to think of it, didn’t your team just turn off some of your paid ads?
But if the data tells you traffic has doubled, it must be true, right? Unfortunately, according to Twyman’s Law, probably not.
Anyone relying on A/B testing or running controlled experiments online to make business decisions must be familiar with Twyman’s Law. That’s why, in this article, we’re going to answer the following questions:
By the end of this post, you’ll understand that even though Twyman’s Law may seem discouraging at times, it’s ultimately a guiding asset to your business’s long-term growth. For now, though, let’s get clear on precisely what Twyman’s Law is and how it can impact your business.
Twyman’s Law is an idea that states, “any figure which looks interesting or different is usually wrong.” It’s commonly attributed to UK radio and media audience researcher Tony Twyman, though he never formally published the law (or even put it in writing, for that matter). And yet, Twyman’s Law is one of the governing principles that prevent bad data from negatively impacting business decisions.
For example, look at what happened to the team over at VWO, an online platform specializing in A/B testing. Their marketing team measured conversions from the homepage based on various segments, one of which was “device used.” They noticed that people using Windows had a 400% higher conversion rate than people using Mac OS X.
Before making any significant changes to their marketing strategy, though, the VWO team investigated. It turns out that they’d recently installed Quality Assurance (QA) software to test the signup forms on their homepage every hour. And as luck would have it, the software ran on Windows as it conducted these tests. Lo and behold, the 400% difference in the data was nothing more than their QA software doing its job.
And this is exactly what Twyman’s Law is here to remind you: "the more unusual or interesting the data, the more likely they are to have been the result of an error of one kind or another."
Twyman’s Law is important for A/B testing because it prevents your team from letting errors in the data guide business decisions. It helps you weed out confirmation bias from the process and ensures that you’re verifying anomalies in the data before blatantly assuming its accuracy.
In other words, it forces you to be suspect of any large changes in your data before popping the champagne.
In a paper called “Unexpected Results in Online Controlled Experiments,” Ronny Kohavi and Roger Longbotham provide multiple examples of Twyman’s Law in action. Plus, they outline why these examples should change your approach to A/B testing. One of the specific problems outlined by Ronny and Roger relates to exposure control:
"The MSN US Home Page redirects users from some countries to their local country: if you visit 'www.msn.com' from an IP in India or the UK, the assumption is that you want to see the local MSN Home Page and are thus redirected automatically or semi-automatically (a popup shows up with a question). Many international sites (e.g., Google) implement this reverse-IP lookup to raise awareness of their local sites and help users. When a new version of the MSN US Home Page was tested in a controlled experiment, the reverse-IP lookup was not yet implemented for the new page. Consequently, the results were highly biased because the population of users from non-US IPs was much higher in the Treatment than in the Control."
Had the team testing these parameters at MSN blindly trusted the data, they would’ve wrongly believed that the new MSN US Home Page was getting more engagement than the control page. But, the higher traffic was simply due to the US Home Page failing to redirect users from different countries to their local Home Page.
This is a great example of how being aware of Twyman’s Law makes you more skeptical of your data, especially when it seems too good to be true. It’s a yellow flag reminding you to slow down and ask more questions.
Now that we know how the Twyman law works, one question remains: how can we run smarter controlled experiments and avoid Twyman’s Law from cropping up altogether? Kohavi gives three tips for conducting Twyman-proof data experiments.
If checklists can save the lives of pilots, astronauts, and medical patients, they can help your team develop smarter digital experiments. In a paper delivered at the 41st ACM/IEEE International Conference on Software Engineering, the authors offered three checklists for running more trustworthy controlled experiments online. Below is an example of one checklist:
The platform you choose — such as Microsoft’s Experimentation Platform — may automate several of the steps above. The authors suggest creating modified checklists that include the actions you must do manually.
In an A/A test, the control and the variation are identical. In other words, A/A tests run matching versions against each other. If the platform is trustworthy, your A/A test should report no differences.
However, A/A tests often fail and, when they do, they highlight flaws that you can address before running your A/B test. Run many A/A tests to ensure your tool is trustworthy. Kohavi explains how on LinkedIn:
“The idea is simple: take an A/B testing system, split the users into two groups, but make B identical to A (hence the name A/A test). If the system is operating correctly, then in repeated trials about 5% of the time a given metric will be statistically significant with a p-value less than 0.05. More generally, for each non-discrete metric, the distribution of p-values from repeated trials should be close to a uniform distribution.”
By running isolated A/A tests first, you can better validate the accuracy of your A/B tests’ tooling to ensure the quality of your results.
Though your experiment may focus on just one or two features, you should identify a much larger set of metrics that you think should remain unchanged throughout your experiment. These metrics are your guardrails to ensure your testing parameters will give you the isolated results you’re looking for.
For example, imagine you were testing two Home Pages for your website with one Control page and one Variant. You want to test some new headlines to see if it has an impact on:
But throughout the test, your Variant page takes twice as long to load as the Control. Since you weren’t changing the design or adding any heavy media files to the page, this would be a yellow flag indicating that you should investigate. After all, a simple change in a few headlines shouldn’t affect how quickly the page loads, meaning your controlled experiment likely has a bug in the system.
By getting clear on which metrics shouldn’t be affected by your A/B tests, you can be more vigilant about the results of your experiment.
Twyman’s Law can be discouraging when encountered. But, it's fun for those obsessed with data to find new trends/patterns as we run controlled tests. That’s because interesting results can lead to impactful breakthroughs – the kinds of breakthroughs that ultimately lead to higher growth.
But let’s be honest: insights from your data are only as valuable as the accuracy of the data itself.
Rather than viewing Twyman’s Law as a pessimistic principle that only serves to rain on your digital parade, think of it as the wise guardian keeping a close eye on your experiments. And remember, just like in many areas of life, if something seems too good to be true, it probably is.
If you’d like to learn more about running controlled A/B tests and how to consistently get the results you’re after, you should definitely check out Ronny Kohavi’s upcoming course on Accelerating Innovation with A/B Testing. Ronny has 20+ years of experience working at tech giants like Amazon, Microsoft, Airbnb, and more. He’s been a pioneer in data mining and machine learning, and this intimate course allows you to learn directly from his decades-worth of experience.
Click here to register and secure your seat for Ronny’s course today!