在加权桶列表中确定性地分配一个id
我正在一个网站上运行 n 个拆分测试。我想将均匀分布的整数用户 ID 分配给 n 个存储桶之一,并且确定性地使同一用户始终获得相同的测试。
此时,我可以通过将用户 ID 修改为 n,在拆分测试列表中选择一个索引。如果我想对某些测试进行加权怎么办?
例如,存储桶 #1/21 被分配 90% 的时间,而其余 20 个测试被分配 0.5% 的时间。
我觉得我可以以某种方式扩大列表的大小,并且仍然使用 mod 技术来实现这一点,但是在内存中拥有潜在的巨大的临时列表似乎不太优雅。
I'm running n split tests on a website. I want to assign an evenly distributed integer user id to one of the n buckets, and deterministically so the same user always gets the same test.
At this point, I can just pick an index in the list of split tests by modding the user id by n. What if I want to weight certain tests?
For example, bucket #1/21 is assigned 90% of the time and the remaining 20 tests are assigned 0.5% of the time.
I feel like I can somehow scale up the size of my list and still use the mod technique to accomplish this, but having potentially huge, temporary lists in memory seems inelegant.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果大多数存储桶具有不同的大小,其中大小定义为 id 的百分比,那么您必须以某种方式在内存中表示它。否则,你怎么知道这些百分比呢?
一种可用的解决方案是使用 100 个虚拟存储桶,每个虚拟存储桶代表 1% 的 id。然后将 90 个虚拟存储桶关联到存储桶 #1/21。然后,您可以执行 mod 100,如果它落在前 90 个虚拟存储桶中,则将 id 分配给存储桶 #1。您可以通过将每个存储桶的百分比除以所有百分比的 GCD 来获得虚拟存储桶的最佳数量,在您的示例中为 0.5 (GCD(90, 0.5))。
从您的示例来看,只有一种不同的存储桶大小。最佳解决方案实际上取决于您可以采取什么类型的安排。
If most buckets have distinct sizes, where size is defined as percentage of ids, then you'll have to represent this in memory somehow. Otherwise, how else are you going to know these percentages?
One solution to use is to have let's say 100 virtual buckets, each representing 1% of the ids. Then associate 90 of the virtual buckets to bucket #1/21. Then you can perform a mod 100 and if it falls in the fist 90 virtual buckets, assign the id to bucket #1. You can get the optimal number of virtual buckets by dividing each bucket's percentage by the GCD of all percentages, which in your example is 0.5 (GCD(90, 0.5)).
From your example, there is only one distinct bucket size though. The best solution really depends on what types of arrangements you could have.