选择随机行但有赔率

发布于 2024-09-12 10:55:44 字数 578 浏览 3 评论 0原文

我有一个行数据集,每行的“赔率”数字在 1 到 100 之间。我希望以最有效的方式做到这一点。赔率加起来不一定是 100。

我有一些想法。

一个) 选择整个数据集,然后将所有赔率相加并生成 1 到该数字之间的随机数。然后循环遍历数据集,从数字中扣除赔率,直到为 0。

我希望将对数据库的影响降至最低,因此我考虑是否只能选择我需要的行。

b)

SELECT * FROM table WHERE (100*RAND()) < odds

我考虑过LIMIT 0,1

但是如果项目具有相同的概率,则只会返回其中之一

或者获取整个数据集并从那里随机选择一个......但几率是受到影响,因为它变成了有赔率的随机,然后变成了无赔率的随机,因此赔率变得有利于更高的赔率(甚至更是如此)。

我想我可以按赔率排序 ASC,然后获取整个数据集,然后使用 PHP 从与第一条记录(最低)赔率相同的行中随机抽取。

似乎是一个笨拙的解决方案。

有人有更好的解决方案吗?如果不是,以上哪一项最好?

I have a dataset of rows each with an 'odds' number between 1 and 100. I am looking to do it in the most efficient way possible. The odds do not necessarily add up to 100.

I have had a few ideas.

a)
Select the whole dataset and then add all the odds up and generate a random number between 1 and that number. Then loop through the dataset deducting the odds from the number until it is 0.

I was hoping to minimize the impact on the database so I considered if I could only select the rows I needed.

b)

SELECT * FROM table WHERE (100*RAND()) < odds

I considered LIMIT 0,1

But then if items have the same probability only one of the will be returned

Alternatively take the whole dataset and pick a random one from there... but then the odds are affected as it becomes a random with odds and then a random without odds thus the odds become tilted in favour of the higher odds (even more so).

I guess I could order by odds ASC then take the whole dataset and then with PHP take a random out of the rows with the same odds as the first record (the lowest).

Seems like a clumsy solution.

Does anyone have a superior solution? If not which one of the above is best?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

时光清浅 2024-09-19 10:55:44

做一些前期工作,在表格中添加一些有助于选择的列。例如,假设您有这些行,

 X  2  
 Y  3
 Z  1

我们添加一些累积值,

 Key Odds Start  End 
 X    2     0     1      // range 0->1, 2 values == odds
 Y    3     2     4      // range 2->4, 3 values == odds
 Z    1     5     5      // range 5->5, 1 value == odds

选择“开始”和“结束”,如下所示。第一行从零开始。后续行的开始位置比前一个结束位置多一个。结束是(开始 + 赔率 - 1)。

现在在 0 到 Max(End) 范围内选择一个随机数 R

Select * from T where R >= T.Start and R <= T.End

如果数据库足够聪明,我们也许可以使用

 Select * from T where R >= T.Start and R <= (T.Start + T.Odds - 1)

我推测拥有带有索引的 End 列可能会提供更好的性能。此外,Max(End) 可能会被隐藏在某个地方,并在必要时通过触发器进行更新。

显然,更新开始/结束存在一些麻烦。 或插入以某种方式自然排序,那么这可能还不错,

  • 如果表内容稳定
  • 以便每个新行都从旧的最高行继续。

Do some up-front work, add some columns to your table that help the selection. For example suppose you have these rows

 X  2  
 Y  3
 Z  1

We add some cumulative values

 Key Odds Start  End 
 X    2     0     1      // range 0->1, 2 values == odds
 Y    3     2     4      // range 2->4, 3 values == odds
 Z    1     5     5      // range 5->5, 1 value == odds

Start and End are chosen as follows. The first row has a start of zero. Subsequent rows have a start one more than previous end. End is the (Start + Odds - 1).

Now pick a random number R in the range 0 to Max(End)

Select * from T where R >= T.Start and R <= T.End

If the database is sufficiently clever we may we be able to use

 Select * from T where R >= T.Start and R <= (T.Start + T.Odds - 1)

I'm speculating that having an End column with an index may give the better performance. Also the Max(End) perhaps gets stashed somewhere and updated by a trigger when ncessary.

Clearly there's some hassle in updating the Start/End. This may not be too bad if either

  • The table contents are stable
  • or insertions are in someway naturally ordered, so that each new row just continues from the old highest.
早茶月光 2024-09-19 10:55:44

如果您使用代码并添加 ORDER BY RAND()LIMIT 1 会怎样?

SELECT * FROM table WHERE (100*RAND()) < odds ORDER BY RAND() LIMIT 1

这样,即使你有相同概率的倍数,它也总是会随机排序,然后你只需要第一个条目。

What if you took your code, and added an ORDER BY RAND() and LIMIT 1?

SELECT * FROM table WHERE (100*RAND()) < odds ORDER BY RAND() LIMIT 1

This way, even if you have multiples of the same probability, it will always come back randomly ordered, then you just take the first entry.

萌无敌 2024-09-19 10:55:44
select * from table 
where id between 1 and 100 and ((id % 2) <> 0) 
order by NewId() 
select * from table 
where id between 1 and 100 and ((id % 2) <> 0) 
order by NewId() 
迷路的信 2024-09-19 10:55:44

唔。不完全清楚你想要什么结果,所以如果这有点疯狂,请耐心等待。话虽这么说,怎么样:

制作一张新桌子。该表是固定数据表,如下所示:

Odds
====
   1
   2
   2
   3
   3
   3
   4
   4
   4
   4
etc, 
etc.

然后从数据集连接到赔率列上的该表。对于表中的每一行,您将获得与该行的给定赔率一样多的行数。

然后随机选择其中一组。

Hmm. Not entirely clear what result you want, so bear with me if this is a bit crazy. That being said, how about:

Make a new table. The table is a fixed data table, and looks like this:

Odds
====
   1
   2
   2
   3
   3
   3
   4
   4
   4
   4
etc, 
etc.

Then join from your dataset to that table on the odds column. You'll get as many rows back for each row in your table as the given odds of that row.

Then just pick one of that set at random.

晚风撩人 2024-09-19 10:55:44

如果您在赔率列上有一个索引和一个主键,那么这将非常有效:

SELECT id, odds FROM table WHERE odds > 0

数据库甚至不必从表中读取,它会从赔率索引中获取所需的一切。

然后,您将选择一个介于 1 和返回的行数之间的随机值。

然后从返回的行数组中选择该行。

然后,最后选择整个目标行:

SELECT * FROM table WHERE id = ?

这确保了具有赔率值的所有行之间的均匀分布。


或者,将赔率放入不同的表中,并使用自动增量主键。

Odds
ID     odds
1      4
2      9
3      56
4      12

将ID外键代替赔率值存储在主表中,并为其建立索引。

首先,获取最大值。这永远不会触及数据库。它使用索引:

SELECT MAX(ID) FROM Odds

获取 1 和最大值之间的随机值。

然后选择记录。

SELECT * FROM table
JOIN Odds ON Odds.ID = table.ID
WHERE Odds.ID >= ?
LIMIT 1

如果您倾向于删除赔率值或回滚插入以保持分布均匀,则这将需要一些维护。

SQL Antipatterns 书中有一整章关于随机选择的内容。

If you have an index on the odds column, and a primary key, this would be very efficient:

SELECT id, odds FROM table WHERE odds > 0

The database wouldn't even have to read from the table, it would get everything it needed from the odds index.

Then, you'll select a random value between 1 and the number of rows returned.

Then select that row from the array of rows returned.

Then, finally, select the whole target row:

SELECT * FROM table WHERE id = ?

This assures an even distribution between all rows with an odds value.


Alternatively, put the odds in a different table, with an autoincrement primary key.

Odds
ID     odds
1      4
2      9
3      56
4      12

Store the ID foreign key in the main table instead of the odds value, and index it.

First, get the max value. This never touches the database. It uses the index:

SELECT MAX(ID) FROM Odds

Get a random value between 1 and the max.

Then select the record.

SELECT * FROM table
JOIN Odds ON Odds.ID = table.ID
WHERE Odds.ID >= ?
LIMIT 1

This will require some maintenance if you tend to delete Odds value or roll back inserts to keep the distribution even.

There is a whole chapter on random selection in the book SQL Antipatterns.

不即不离 2024-09-19 10:55:44

我没有尝试过,但也许是这样的(带有?从 0 到 SUM(odds) - 1 的随机数)?

SET @prob := 0;

SELECT
  T.*,
  (@prob := @prob + T.odds) AS prob
FROM table T
WHERE prob > ?
LIMIT 1

这与您的想法 a) 基本相同,但完全在一个(好吧,如果算上变量设置的话,技术上是两个)SQL 命令内。

I didn't try it, but maybe something like this (with ? a random number from 0 to SUM(odds) - 1)?

SET @prob := 0;

SELECT
  T.*,
  (@prob := @prob + T.odds) AS prob
FROM table T
WHERE prob > ?
LIMIT 1

This is basically the same as your idea a), but entirely within one (well, technically two if you count the variable set-up) SQL commands.

柠檬心 2024-09-19 10:55:44

适合 O(log(n)) 更新的通用解决方案是这样的:

  • 将对象存储为(平衡)树的叶子。
  • 在每个分支节点处,存储其下所有对象的权重。
  • 添加、删除或修改节点时,更新其父节点的权重。

然后选择 0 到(总重量 - 1)之间的数字并沿着树向下导航,直到找到正确的对象。

由于您不关心树中事物的顺序,因此可以将它们存储为由 N 个指针和 N-1 个数字组成的数组。

A general solution, suitable for O(log(n)) updates, is something like this:

  • Store objects as leaves of a (balanced) tree.
  • At each branch node, store the weights of all objects under it.
  • When adding, removing, or modifying nodes, update weights of their parents.

Then pick a number between 0 and (total weight - 1) and navigate down the tree until you find the right object.

Since you don't care about the order of things in the tree, you can store them as an array of N pointers and N-1 numbers.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文