选择随机行但有赔率
我有一个行数据集,每行的“赔率”数字在 1 到 100 之间。我希望以最有效的方式做到这一点。赔率加起来不一定是 100。
我有一些想法。
一个) 选择整个数据集,然后将所有赔率相加并生成 1 到该数字之间的随机数。然后循环遍历数据集,从数字中扣除赔率,直到为 0。
我希望将对数据库的影响降至最低,因此我考虑是否只能选择我需要的行。
b)
SELECT * FROM table WHERE (100*RAND()) < odds
我考虑过LIMIT 0,1
但是如果项目具有相同的概率,则只会返回其中之一
或者获取整个数据集并从那里随机选择一个......但几率是受到影响,因为它变成了有赔率的随机,然后变成了无赔率的随机,因此赔率变得有利于更高的赔率(甚至更是如此)。
我想我可以按赔率排序
ASC,然后获取整个数据集,然后使用 PHP 从与第一条记录(最低)赔率相同的行中随机抽取。
似乎是一个笨拙的解决方案。
有人有更好的解决方案吗?如果不是,以上哪一项最好?
I have a dataset of rows each with an 'odds' number between 1 and 100. I am looking to do it in the most efficient way possible. The odds do not necessarily add up to 100.
I have had a few ideas.
a)
Select the whole dataset and then add all the odds up and generate a random number between 1 and that number. Then loop through the dataset deducting the odds from the number until it is 0.
I was hoping to minimize the impact on the database so I considered if I could only select the rows I needed.
b)
SELECT * FROM table WHERE (100*RAND()) < odds
I considered LIMIT 0,1
But then if items have the same probability only one of the will be returned
Alternatively take the whole dataset and pick a random one from there... but then the odds are affected as it becomes a random with odds and then a random without odds thus the odds become tilted in favour of the higher odds (even more so).
I guess I could order by odds
ASC then take the whole dataset and then with PHP take a random out of the rows with the same odds as the first record (the lowest).
Seems like a clumsy solution.
Does anyone have a superior solution? If not which one of the above is best?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
做一些前期工作,在表格中添加一些有助于选择的列。例如,假设您有这些行,
我们添加一些累积值,
选择“开始”和“结束”,如下所示。第一行从零开始。后续行的开始位置比前一个结束位置多一个。结束是(开始 + 赔率 - 1)。
现在在 0 到 Max(End) 范围内选择一个随机数 R
如果数据库足够聪明,我们也许可以使用
我推测拥有带有索引的 End 列可能会提供更好的性能。此外,Max(End) 可能会被隐藏在某个地方,并在必要时通过触发器进行更新。
显然,更新开始/结束存在一些麻烦。 或插入以某种方式自然排序,那么这可能还不错,
Do some up-front work, add some columns to your table that help the selection. For example suppose you have these rows
We add some cumulative values
Start and End are chosen as follows. The first row has a start of zero. Subsequent rows have a start one more than previous end. End is the (Start + Odds - 1).
Now pick a random number R in the range 0 to Max(End)
If the database is sufficiently clever we may we be able to use
I'm speculating that having an End column with an index may give the better performance. Also the Max(End) perhaps gets stashed somewhere and updated by a trigger when ncessary.
Clearly there's some hassle in updating the Start/End. This may not be too bad if either
如果您使用代码并添加
ORDER BY RAND()
和LIMIT 1
会怎样?这样,即使你有相同概率的倍数,它也总是会随机排序,然后你只需要第一个条目。
What if you took your code, and added an
ORDER BY RAND()
andLIMIT 1
?This way, even if you have multiples of the same probability, it will always come back randomly ordered, then you just take the first entry.
唔。不完全清楚你想要什么结果,所以如果这有点疯狂,请耐心等待。话虽这么说,怎么样:
制作一张新桌子。该表是固定数据表,如下所示:
然后从数据集连接到赔率列上的该表。对于表中的每一行,您将获得与该行的给定赔率一样多的行数。
然后随机选择其中一组。
Hmm. Not entirely clear what result you want, so bear with me if this is a bit crazy. That being said, how about:
Make a new table. The table is a fixed data table, and looks like this:
Then join from your dataset to that table on the odds column. You'll get as many rows back for each row in your table as the given odds of that row.
Then just pick one of that set at random.
如果您在赔率列上有一个索引和一个主键,那么这将非常有效:
数据库甚至不必从表中读取,它会从赔率索引中获取所需的一切。
然后,您将选择一个介于 1 和返回的行数之间的随机值。
然后从返回的行数组中选择该行。
然后,最后选择整个目标行:
这确保了具有赔率值的所有行之间的均匀分布。
或者,将赔率放入不同的表中,并使用自动增量主键。
将ID外键代替赔率值存储在主表中,并为其建立索引。
首先,获取最大值。这永远不会触及数据库。它使用索引:
获取 1 和最大值之间的随机值。
然后选择记录。
如果您倾向于删除赔率值或回滚插入以保持分布均匀,则这将需要一些维护。
SQL Antipatterns 书中有一整章关于随机选择的内容。
If you have an index on the odds column, and a primary key, this would be very efficient:
The database wouldn't even have to read from the table, it would get everything it needed from the odds index.
Then, you'll select a random value between 1 and the number of rows returned.
Then select that row from the array of rows returned.
Then, finally, select the whole target row:
This assures an even distribution between all rows with an odds value.
Alternatively, put the odds in a different table, with an autoincrement primary key.
Store the ID foreign key in the main table instead of the odds value, and index it.
First, get the max value. This never touches the database. It uses the index:
Get a random value between 1 and the max.
Then select the record.
This will require some maintenance if you tend to delete Odds value or roll back inserts to keep the distribution even.
There is a whole chapter on random selection in the book SQL Antipatterns.
我没有尝试过,但也许是这样的(带有?从 0 到
SUM(odds) - 1
的随机数)?这与您的想法 a) 基本相同,但完全在一个(好吧,如果算上变量设置的话,技术上是两个)SQL 命令内。
I didn't try it, but maybe something like this (with ? a random number from 0 to
SUM(odds) - 1
)?This is basically the same as your idea a), but entirely within one (well, technically two if you count the variable set-up) SQL commands.
适合 O(log(n)) 更新的通用解决方案是这样的:
然后选择 0 到(总重量 - 1)之间的数字并沿着树向下导航,直到找到正确的对象。
由于您不关心树中事物的顺序,因此可以将它们存储为由 N 个指针和 N-1 个数字组成的数组。
A general solution, suitable for O(log(n)) updates, is something like this:
Then pick a number between 0 and (total weight - 1) and navigate down the tree until you find the right object.
Since you don't care about the order of things in the tree, you can store them as an array of N pointers and N-1 numbers.