如何证明“六度分离”? 以编程方式概念?

发布于 2024-07-24 22:52:29 字数 201 浏览 3 评论 0原文

我有一个包含 2000 万用户以及这些人之间的联系的数据库。 如何在编程中以最有效的方式证明“六度分离”的概念?

有关六度分离的文章的链接

I have a database of 20 million users and connections between those people. How can I prove the concept of "Six degrees of separation" concept in the most efficient way in programming?

link to the article about Six degrees of separation

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

江挽川 2024-07-31 22:52:30

我认为最有效的方式(最坏情况)几乎是N^3。 构建一个邻接矩阵,然后采用该矩阵 ^2、^3、^4、^5 和 ^6。 查找图中矩阵为 0 到矩阵^6 的任何条目。

启发式地,您可以尝试挑选出子图(一大群人仅通过相对较少数量的“桥”节点连接到其他人群),但绝对不能保证您会拥有任何子图。

I think the most efficient way (worst case) is almost N^3. Build an adjacency matrix, and then take that matrix ^2, ^3, ^4, ^5 and ^6. Look for any entries in the graph that are 0 for matrix through matrix^6.

Heuristically you can try to single out subgraphs ( large clumps of people who are only connected to other clumps by a relatively small number of "bridge" nodes ) but there's absolutely no guarantee you'll have any.

白昼 2024-07-31 22:52:30

好吧,已经给出了更好的答案,但从我的角度来看,我会选择 Floyd-Warshall所有配对最短路径算法,其复杂度为O(n^3)。 我不确定图直径算法的复杂性,但它“听起来”也是 O(n^3)。 如果有人知道的话,我想对此进行澄清。

顺便说一句,你真的有这样的数据库吗? 可怕的。

Well a better answer has already been given, but off the top of my head I would have gone with the Floyd-Warshall all pairs shortest path algorithm, which is O(n^3). I'm unsure of the complexity of the graph diameter algorithm, but it "sounds" like this would also be O(n^3). I'd like clarification on this if anyone knows.

On a side note, do you really have such a database? Scary.

罪歌 2024-07-31 22:52:29

您只想测量图表的直径。
这正是找出图中最远连接的节点之间的分离的度量。

Google 上有很多算法,Boost graph 也有。

You just want to measure the diameter of the graph.
This is exactly the metric to find out the seperation between the most-distantly-connected nodes in a graph.

Lots of algorithms on Google, Boost graph too.

沫离伤花 2024-07-31 22:52:29

您可能可以将图形放入内存中(以每个顶点都知道其邻居列表的表示形式)。

然后,从每个顶点n,您可以运行广度优先搜索(使用队列)至深度 6 并计算访问的顶点数。 如果没有访问所有顶点,则证明该定理是错误的。 在其他情况下,继续处理下一个顶点n

这是 O(N*(N + #edges)) = N*(N + N*100) = 100N^2,如果用户平均有 100 个连接,这对于 N=2000 万来说并不理想。 我想知道上述库是否可以以更好的时间复杂度计算直径(一般算法为 O(N^3))。

各个顶点的计算是独立的,因此它们可以并行完成。

一点启发:从度数最低的顶点开始(反驳定理的机会更大)。

You can probably fit the graph in memory (in the representation that each vertex knows a list of its neighbors).

Then, from each vertex n, you can run a breadth-first search (using a queue) to the depth of 6 and count number of vertices visited. If not all vertices are visited, you have disproved the theorem. In other case, continue with next vertex n.

This is O(N*(N + #edges)) = N*(N + N*100) = 100N^2, if user has 100 connections on average, Which is not ideal for N=20 million. I wonder if the mentioned libraries can compute the diameter in better time complexity (general algorithm is O(N^3)).

The computations for individual vertices are independent, so they could be done in parallel.

A little heuristic: start with vertices that have the lowest degree (better chance to disprove the theorem).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文