How to interpret the significance of A/B testing results – Algolia

The significance score we provide can be interpreted as the confidence we have that a result is significant, and how sure you can be in your conclusions.

Usually in statistics, once a result has a higher than 95% significance score we can consider the result statistically significant. That means that the change you are seeing is unlikely due to chance alone, and that it is reasonable to conclude that the difference is due to the change you are testing.

Important Caveat

We only have control of certain parts of the input data coming in, and this means that the statistical significance we present is not definitive. We are relying on a few things:

1. An even distribution of searches vs userTokens

We make the assumption that your searches are evenly distributed among your users. i.e. each userToken makes the same number of searches on average. In some situations, such as when you have a back end server sending lots of requests with the same userToken, the results may be skewed.

How to check it: Look at the number of tracked searched, is it matching the supposed split of the test?

Screenshot_2022-09-23_at_14.17.23.png

In the example above, the first search had 10,000 searches from 100 userTokens which suggests that a lot of searches are being sent from the same userToken as 1,000 searches per user is unlikely.

Below, we can see 1,000 searches for 100 users, so an average of 10 searches per user, which suggests a more even distribution. If we see a result like this then it suggests that we have a single source generating searches that should be excluded from the test.

As this outlier traffic can significantly skew the results, making it unrepresentative of real user traffic, Algolia’s A/B testing feature automatically excludes outlier users when calculating A/B test result metrics.

2. A consistent test period

Any change in your index configuration during the A/B test can create distortions in your results. This is not something we can detect when computing the significance score.

3. Problems with your events recording

If there is an issue with the recording of your events then this will have an impact on your A/B test result. Make sure you are following the steps to validate your events.

Important Caveat

Related articles