基于比较的排名算法(变体)
这个问题是上一个问题的变体: 基于比较的排名算法
我想提出的变化是:如果通过以下方式解决循环会怎样?丢弃最早的矛盾选择,以便实际上可以使用传递算法。
在这里,我粘贴了原来的问题:
“我想对项目集合(大小可能大于 100,000)进行排名或排序,其中集合中的每个项目没有内在(可比较)值,相反,我所拥有的只是用户以“主观”方式提供的任意两个项目之间的比较
示例:
考虑具有元素[a,b,c,d]的集合并且用户的比较:
b>a,a>。 d, d > c
该集合的正确顺序是 [b, a, d, c]。
这个例子很简单,但可能有更复杂的情况:
由于比较是主观的,用户也可以说 c >。 ; b. 在这种情况下,您可能无法“连接”所有项目,即:
b > a,d > c。它可能是:[b,a,d,c]或[d,c,b,a]。在这种情况下,任何一个排序都是可以接受的
......
问题是:
是否存在可以解决该问题的算法 。如上所述,如果是这种情况,我不想花精力去想出一个。如果没有特定的算法,您是否可以向我指出某些类型的算法或技术?”
This question is a variation on a previous question:
Comparison-based ranking algorithm
The variation I would like to pose is: what if loops are solved by discarding the earliest contradicting choices so that a transitive algorithm could actually be used.
Here I have pasted the original question:
"I would like to rank or sort a collection of items (with size potentially greater than 100,000) where each item in the collection does not have an intrinsic (comparable) value, instead all I have is the comparisons between any two items which have been provided by users in a 'subjective' manner.
Example:
Consider a collection with elements [a, b, c, d]. And comparisons by users:
b > a, a > d, d > c
The correct order of this collection would be [b, a, d, c].
This example is simple however there could be more complicated cases:
Since the comparisons are subjective, a user could also say that c > b. In which case that would cause a conflict with the ordering above. Also you may not have comparisons that 'connects' all the items, ie:
b > a, d > c. In which case the ordering is ambiguous. It could be : [b, a, d, c] or [d, c, b, a]. In this case either ordering is acceptable.
...
The Question is:
Is there an algorithm which already exists that can solve the problem above, I would not like to spend effort trying to come up with one if that is the case. If there is no specific algorithm, is there perhaps certain types of algorithms or techniques which you can point me to?"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不存在“循环”的更简单版本可以使用拓扑排序来处理。
现在,对于更复杂的场景,如果对于每个“周期”,元素在最终排名中出现的顺序并不重要,那么您可以尝试以下操作:
将问题建模为定向 graph (即
a > b
意味着存在结果图中的边从节点开始“a”并以节点“b”结束)。计算图表的强连通分量 (SCC)。简而言之,SCC 是一组节点,具有以下属性:您可以通过遵循边列表从集合中的任何节点到达集合中的任何节点(这对应于原始问题中的“循环”)。< /p>
通过将每个节点“折叠”到它所属的 SCC 来转换图,但保留不同 SCC 之间的边。
最后,我们准备好了。拓扑排序应该告诉您在不同 SCC 中打印节点的正确顺序。对于同一 SCC 中的节点,无论您选择的顺序是什么,都会总是存在“循环”,因此可能会以随机顺序打印它们。
The simpler version where no "cycle" exists can be dealt with using topological sorting.
Now, for the more complex scenario, if for every "cycle" the order on which the elements appear in the final ranking does not matter, then you could try the following:
model the problem as a directed graph (i.e. the fact that
a > b
implies that there is an edge in the resulting graph starting in node "a" and ending in node "b").calculate the strongly connected components (SCC) of the graph. In short, an SCC is a set of nodes with the property that you can get to any node in the set from any node in the set by following a list of edges (this corresponds to your "cycles" in the original problem).
transform the graph by "collapsing" each node into the SCC it belongs to, but preserve the edges that that go between different SCC's.
Finally, we're ready. The topological sort should tell you the right order in which to print nodes in different SCC's. For the nodes in the same SCC's, no matter what the order you choose is, there will always be "cycles", so a possibility might be printing them in a random order.