如何通过众包排序对一百万张图像进行排名

发布于 2024-07-06 18:46:40 字数 533 浏览 13 评论 0原文

我想通过制作一个游戏来对一组风景图像进行排名，网站访问者可以对它们进行评分，以便找出人们认为哪些图像最具吸引力。

这样做的好方法是什么？

流行还是不流行的风格？即显示单个图像，要求用户从 1-10 对其进行排名。在我看来，这使我能够平均得分，并且我只需要确保我在所有图像中获得均匀的选票分配。实施起来相当简单。
选择A还是B？即显示两张图像，要求用户选择更好的一张。这很有吸引力，因为没有数字排名，只是比较。但我该如何实施呢？我的第一个想法是将其作为快速排序，由人类提供比较操作，一旦完成，只需无限地重复排序。

你会怎么做？

如果您需要数字，我指的是一个每日访问量为 20,000 次的网站上的 100 万张图像。我想象一小部分人可能会玩这个游戏，为了便于讨论，假设我每天可以生成 2,000 个人类排序操作！这是一个非营利网站，好奇的人会通过我的个人资料找到它:)

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

記柔刀 2024-07-13 18:46:40

正如其他人所说，排名 1-10 的效果并不好，因为人们的水平不同。

选择 A-or-B 方法的问题在于，它不能保证系统的传递性（A 可以击败 B，但 B 可以击败 C，C 可以击败 A）。 使用非传递比较运算符会破坏排序算法。对于此示例，使用快速排序，未选择作为主元的字母将被错误地相互排名。

在任何给定时间，您都希望获得所有图片的绝对排名（即使其中一些/全部图片并列）。您还希望您的排名不会发生变化除非有人投票。

我会使用选择 A-或-B（或平局） 方法，但确定类似于 Elo 评级系统，用于 2 人游戏（最初是国际象棋）的排名：

Elo 玩家评级
系统比对玩家的比赛记录
对照对手的比赛记录
并确定概率
赢得比赛的玩家。这
概率因素决定了多少
玩家的评分上升或
根据每个结果向下
匹配。当玩家击败
对手评分较高，则
玩家的评分上升得比如果
他或她击败了一名玩家
较低的评级（因为玩家应该
击败实力较低的对手
评级）。

Elo系统：

所有新玩家一开始的基本评级为1600
获胜概率 = 1/(10^((对手当前评级-玩家当前评级)/400) + 1)
ScorePt = 胜者得1分，负者得0分，平局得0.5分。
玩家的新评级 = 玩家的旧评级 + (K-Value * (ScoringPt – 玩家的获胜概率))

用图片替换“玩家”，您可以通过一个简单的方法根据公式调整两张图片的评级。然后，您可以使用这些数字分数进行排名。（此处的 K 值是锦标赛的“级别”。小型本地锦标赛为 8-16，大型邀请赛/区域赛为 24-32。您可以使用 20 这样的常量）。

使用这种方法，您只需要为每张图片保留一个数字，这比将每张图片的单独排名保留在其他图片上要少得多的内存消耗。

编辑：根据评论添加了更多内容。

As others have said, ranking 1-10 does not work that well because people have different levels.

The problem with the Pick A-or-B method is that its not guaranteed for the system to be transitive (A can beat B, but B beats C, and C beats A). Having nontransitive comparison operators breaks sorting algorithms. With quicksort, against this example, the letters not chosen as the pivot will be incorrectly ranked against each other.

At any given time, you want an absolute ranking of all the pictures (even if some/all of them are tied). You also want your ranking not to change unless someone votes.

I would use the Pick A-or-B (or tie) method, but determine ranking similar to the Elo ratings system which is used for rankings in 2 player games (originally chess):

The Elo player-rating
system compares players’ match records
against their opponents’ match records
and determines the probability of the
player winning the matchup. This
probability factor determines how many
points a players’ rating goes up or
down based on the results of each
match. When a player defeats an
opponent with a higher rating, the
player’s rating goes up more than if
he or she defeated a player with a
lower rating (since players should
defeat opponents who have lower
ratings).

The Elo System:

All new players start out with a base rating of 1600
WinProbability = 1/(10^(( Opponent’s Current Rating–Player’s Current Rating)/400) + 1)
ScoringPt = 1 point if they win the match, 0 if they lose, and 0.5 for a draw.
Player’s New Rating = Player’s Old Rating + (K-Value * (ScoringPt–Player’s Win Probability))

Replace "players" with pictures and you have a simple way of adjusting both pictures' rating based on a formula. You can then perform a ranking using those numeric scores. (K-Value here is the "Level" of the tournament. It's 8-16 for small local tournaments and 24-32 for larger invitationals/regionals. You can just use a constant like 20).

With this method, you only need to keep one number for each picture which is a lot less memory intensive than keeping the individual ranks of each picture to each other picture.

EDIT: Added a little more meat based on comments.

回复收藏 0 原文

柏林苍穹下 2024-07-13 18:46:40

大多数解决这个问题的天真的方法都有一些严重的问题。最糟糕的是 bash.org 和 qdb.us 显示报价 - 用户可以对报价进行向上 (+1) 或向下 (-1) 投票，最佳报价列表按总净得分排序。这遭受了可怕的时间偏见——较旧的引言通过简单的长寿积累了大量的积极选票，即使它们只是有点幽默。如果笑话随着年龄的增长而变得更有趣，那么这个算法可能有意义，但相信我，事实并非如此。

有各种尝试来解决这个问题——查看每个时间段的正面投票数量、对最近的投票进行加权、对较旧的投票实施衰减系统、计算正面投票与负面投票的比率等。大多数都存在其他缺陷。

我认为最好的解决方案是网站最有趣的解决方案< /a> 最可爱, The Fairest 和 Best Thing 使用 - a 修改后的孔多塞投票系统：

系统根据每个人所面对的事情中它通常击败的百分比来给每个人一个数字。因此，每个人都会获得百分比分数 NumberOfThingsIBeat / (NumberOfThingsIBeat + NumberOfThingsThatBeatMe)。此外，在与集合中合理的百分比进行比较之前，某些内容将被禁止出现在顶部列表中。
如果该组中有孔多塞获胜者，此方法将找到它。鉴于这种情况不太可能发生，考虑到统计性质，它会找到“最接近”孔多塞获胜者的那个。

有关实施此类系统的更多信息，维基百科页面上的排名对应该会有所帮助。

该算法要求人们比较两个对象（您选择 A 或 B 选项），但坦率地说，这是一件好事。我相信，在决策理论中，人类在比较两个对象方面比在抽象排名方面要好得多，这一点已被广泛接受。数百万年的进化使我们擅长从树上采摘最好的苹果，但不擅长决定我们采摘的苹果与真正的柏拉图式苹果的接近程度。（顺便说一句，这就是为什么层次分析过程如此漂亮......但这就是有点偏离主题。）

最后要指出的是，SO 使用一种算法来查找最佳答案，该算法与 bash.org 非常相似的算法来找到最佳报价。它在这里运作良好，但在那里却严重失败 - 很大程度上是因为这里旧的、评价很高但现在已经过时的答案可能会被编辑。 bash.org 不允许编辑，而且不清楚你如何去编辑关于现在过时的互联网模因的十年前的笑话，即使你可以......无论如何，我的观点是，正确的算法通常是取决于您的问题的详细信息。 :-)

Most naive approaches to the problem have some serious issues. The worst is how bash.org and qdb.us displays quotes - users can vote a quote up (+1) or down (-1), and the list of best quotes is sorted by the total net score. This suffers from a horrible time bias - older quotes have accumulated huge numbers of positive votes via simple longevity even if they're only marginally humorous. This algorithm might make sense if jokes got funnier as they got older but - trust me - they don't.

There are various attempts to fix this - looking at the number of positive votes per time period, weighting more recent votes, implementing a decay system for older votes, calculating the ratio of positive to negative votes, etc. Most suffer from other flaws.

The best solution - I think - is the one that the websites The Funniest The Cutest, The Fairest, and Best Thing use - a modified Condorcet voting system:

The system gives each one a number based on, out of the things that it has faced, what percentage of them it usually beats. So each one gets the percentage score NumberOfThingsIBeat / (NumberOfThingsIBeat + NumberOfThingsThatBeatMe). Also, things are barred from the top list until they've been compared to a reasonable percentage of the set.
If there's a Condorcet winner in the set, this method will find it. Since that's unlikely, given the statistical nature, it finds the one that's the "closest" to being a Condorcet winner.

For more information on implementing such systems the Wikipedia page on Ranked Pairs should be helpful.

The algorithm requires people to compare two objects (your Pick-A-or-B option), but frankly, that's a good thing. I believe it's very well accepted in decision theory that humans are vastly better at comparing two objects than they are at abstract ranking. Millions of years of evolution make us good at picking the best apple off the tree, but terrible at deciding how closely the apple we picked hews to the true Platonic Form of appleness. (This is, by the way, why the Analytic Hierarchy Process is so nifty...but that's getting a bit off topic.)

One final point to make is that SO uses an algorithm to find the best answers which is very similar to bash.org's algorithm to find the best quote. It works well here, but fails terribly there - in large part because an old, highly rated, but now outdated answer here is likely to be edited. bash.org doesn't allow editing, and it's not clear how you'd even go about editing decade-old jokes about now-dated internet memes even if you could... In any case, my point is that the right algorithm usually depends on the details of your problem. :-)

回复收藏 0 原文

无声情话 2024-07-13 18:46:40

我知道这个问题已经很老了，但我想我会做出贡献，

我会看看微软研究院开发的 TrueSkill 系统。它类似于 ELO，但收敛时间要快得多（与线性相比，看起来呈指数形式），因此您可以从每次投票中获得更多收益。然而，它在数学上更为复杂。

http://en.wikipedia.org/wiki/TrueSkill

回复收藏 0 原文

花之痕靓丽 2024-07-13 18:46:40

我不喜欢不热不热的风格。即使他们都喜欢完全相同的图像，不同的人也会选择不同的数字。另外，我讨厌给事物打分（满分 10 分），我永远不知道该选择哪个数字。

选择 A-或 B 更加简单和有趣。您可以看到两张图像，并在网站上的图像之间进行比较。

回复收藏 0 原文

热血少△年 2024-07-13 18:46:40

这些来自 Wikipedia 的方程使得计算 Elo 评级（图像 A 的算法）变得更简单/更有效B 很简单：

从数据库中获取 Ne、mA、mB 和额定值 RA、RB。
使用执行的比较次数 (Ne) 和图像比较次数 (m) 以及当前评分来计算 KA、KB、QA、QB：

$K$

$QA$

$QB$

计算 EA 和 EB。

$EA$

$EB$ 对

获胜者的 S 进行评分：获胜者为 1，失败者为 0，如果平局为 0.5，
使用以下方法计算双方的新评分：
$New Rating$
更新新评级 RA、RB 并计数 mA、mB

回复收藏 0 原文

作妖 2024-07-13 18:46:40

您可能想要组合使用。

第一阶段：
热与否风格（尽管我会选择 3 个选项投票：糟透了，嗯/好吧。酷！）

一旦你将集合分类到 3 个桶中，那么我会从同一个桶中选择两个图像并继续然后，您可以使用英国足球的升级和降级系统将前几个“糟糕

”移动到 Meh/OK 区域，以细化边缘情况。

回复收藏 0 原文

等待我真够勒 2024-07-13 18:46:40

排名1-10是不行的，每个人的级别不同。总是给出 3-7 分的人的排名会比总是给出 1 或 10 分的人黯然失色。a-

或-b 更可行。

回复收藏 0 原文

丶情人眼里出诗心の 2024-07-13 18:46:40

哇，我比赛迟到了。

我非常喜欢 ELO 系统，但正如欧文所说，在我看来，你会缓慢地建立任何重要的结果。

我相信人类的能力比仅仅比较两个图像要强大得多，但你希望将交互保持在最低限度。

那么，您如何显示 n 张图像（n 是您可以在屏幕上明显显示的任何数字，这可能是 10、20、30，具体取决于用户的偏好）并让他们选择他们认为该批次中最好的图像。现在回到 ELO。你需要修改你的评级系统，但保持同样的精神。事实上，您已将一张图像与其他 n-1 张图像进行了比较。因此，您进行了 n-1 次 ELO 评级，但您应该将评级的变化除以 n-1 次以进行匹配（以便不同 n 值的结果彼此一致）。

你完成了。您现在已经拥有了世界上最好的。一个简单的评级系统，只需点击一下即可处理许多图像。

回复收藏 0 原文