如何证明“六度分离”? 以编程方式概念?
我有一个包含 2000 万用户以及这些人之间的联系的数据库。 如何在编程中以最有效的方式证明“六度分离”的概念?
I have a database of 20 million users and connections between those people. How can I prove the concept of "Six degrees of separation" concept in the most efficient way in programming?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我认为最有效的方式(最坏情况)几乎是N^3。 构建一个邻接矩阵,然后采用该矩阵 ^2、^3、^4、^5 和 ^6。 查找图中矩阵为 0 到矩阵^6 的任何条目。
启发式地,您可以尝试挑选出子图(一大群人仅通过相对较少数量的“桥”节点连接到其他人群),但绝对不能保证您会拥有任何子图。
I think the most efficient way (worst case) is almost N^3. Build an adjacency matrix, and then take that matrix ^2, ^3, ^4, ^5 and ^6. Look for any entries in the graph that are 0 for matrix through matrix^6.
Heuristically you can try to single out subgraphs ( large clumps of people who are only connected to other clumps by a relatively small number of "bridge" nodes ) but there's absolutely no guarantee you'll have any.
好吧,已经给出了更好的答案,但从我的角度来看,我会选择 Floyd-Warshall所有配对最短路径算法,其复杂度为O(n^3)。 我不确定图直径算法的复杂性,但它“听起来”也是 O(n^3)。 如果有人知道的话,我想对此进行澄清。
顺便说一句,你真的有这样的数据库吗? 可怕的。
Well a better answer has already been given, but off the top of my head I would have gone with the Floyd-Warshall all pairs shortest path algorithm, which is O(n^3). I'm unsure of the complexity of the graph diameter algorithm, but it "sounds" like this would also be O(n^3). I'd like clarification on this if anyone knows.
On a side note, do you really have such a database? Scary.
您只想测量图表的直径。
这正是找出图中最远连接的节点之间的分离的度量。
Google 上有很多算法,Boost graph 也有。
You just want to measure the diameter of the graph.
This is exactly the metric to find out the seperation between the most-distantly-connected nodes in a graph.
Lots of algorithms on Google, Boost graph too.
您可能可以将图形放入内存中(以每个顶点都知道其邻居列表的表示形式)。
然后,从每个顶点n,您可以运行广度优先搜索(使用队列)至深度 6 并计算访问的顶点数。 如果没有访问所有顶点,则证明该定理是错误的。 在其他情况下,继续处理下一个顶点n。
这是 O(N*(N + #edges)) = N*(N + N*100) = 100N^2,如果用户平均有 100 个连接,这对于 N=2000 万来说并不理想。 我想知道上述库是否可以以更好的时间复杂度计算直径(一般算法为 O(N^3))。
各个顶点的计算是独立的,因此它们可以并行完成。
一点启发:从度数最低的顶点开始(反驳定理的机会更大)。
You can probably fit the graph in memory (in the representation that each vertex knows a list of its neighbors).
Then, from each vertex n, you can run a breadth-first search (using a queue) to the depth of 6 and count number of vertices visited. If not all vertices are visited, you have disproved the theorem. In other case, continue with next vertex n.
This is O(N*(N + #edges)) = N*(N + N*100) = 100N^2, if user has 100 connections on average, Which is not ideal for N=20 million. I wonder if the mentioned libraries can compute the diameter in better time complexity (general algorithm is O(N^3)).
The computations for individual vertices are independent, so they could be done in parallel.
A little heuristic: start with vertices that have the lowest degree (better chance to disprove the theorem).