我正在尝试对不同的 A/B 测试进行一些统计分析,以了解哪种替代方案更好,并且发现了与此相关的相互矛盾的信息。
首先,我对几个不同的事情感兴趣:
- 通过计算事件(例如转化或发送的电子邮件)来衡量成功的测试
- 通过计算收入来衡量成功的
- 测试 只有两个替代方案(对照和新)的
- 测试 有多个替代方案(对照)的测试和多个新)
我希望找到一组简单的公式或规则来进行此分析,但发现的问题多于答案。
此网站表示您可以不要比较多种替代测试;您只能进行两两比较并进行卡方分析来查看整个测试是否具有统计显着性。
此网站建议一种进行 A/B/C/D 测试的方法(从幻灯片 74 开始),使用 G 检验(据说与卡方相关)分析结果,但不清楚使用模糊因子的细节。它还表明您只能使用 A/B/C/D 方法来消除替代方案,直到您最终在 A/B 比较中获得明显的胜利者。
此网站给出了 A/B/C/ 的示例D 测试(包括对照)并展示如何比较转化率以确定获胜者。与这种方法不同,它不建议消除替代方案,而是立即挑选一个获胜者蝙蝠(假设有统计显着性结果)。
也许我很天真,但我认为现在应该有一个统计分析库来处理这个问题。我还希望获得有关解决这些问题需要哪些算法/方程的更多信息。我的大学统计课已经过去很长时间了。
I am trying to do some statistical analysis of different A/B tests to see which alternative is better and have found conflicting information about this.
First, I am interested in a couple different things:
- Tests that measure success by counting events, such as conversions or emails sent
- Tests that measure success by counting revenue
- Tests that have only two alternatives (control and new)
- Tests that have multiple alternatives (control and multiple new)
I was hoping to find a simple set of formulae or rules for doing this analysis but have found more questions than answers.
This site says that you can't compare multi-alternative tests; you can only do pairwise comparisons and do a chi-squared analysis to see if the whole test is statistically significant or not.
This site Suggests a way to do A/B/C/D testing (starts on slide 74), analysing the results using the G-Test (which it says is related to chi-squared) but isn't clear on the details of using a fudge factor. It also suggests that you can only use the A/B/C/D approach to eliminate alternatives until you end up with a clear winner in an A/B comparison.
This site gives an example of an A/B/C/D test (including control) and shows how to compare the conversion rate to determine a winner. Unlike this approach it does not recommend eliminating alternatives but rather picks a winner right off the bat (Assuming statistically significant results).
Perhaps I'm naive but I would think that by now a stats analysis library would exist to deal with this very problem. I would also appreciate more information about what algorithms/equations are needed to solve these problems. It's been a long time since my university Stats class.
发布评论
评论(1)
对于事件生成比较,您可以使用 Beta 发行版 来实现此目的。每个替代方案都有一些未观察到的p,即产生事件的概率。如果您观察到 N 中的 X 个积极事件,那么您对 p 的不确定性可以通过 Beta(X+1,N -X+1)。
您可以通过查看 P(pA > pB) 来比较两种替代方案,其中 pA 和 pB 是两个 Beta 分布。计算不等式概率的方法可以在这篇论文中找到。
您还可以计算 E[pA-pB]、效应大小,或计算其置信界限。
For the event generating comparison, you could approach this using Beta distributions. Each alternative has some unobserved p, the probability of producing an event. If you observe X positive events out of N, then your uncertainty about p can be modeled by Beta(X+1,N-X+1).
You can compare two alternatives by looking at P(pA > pB), where pA and pB are the two Beta distributions. Methods for computing that inequality probability can be found in this paper.
You can also compute E[pA-pB], the effect size, or compute confidence bounds of the same.