Python 在社交图谱上使用广度优先搜索

发布于 2024-10-08 10:15:42 字数 288 浏览 14 评论 0原文

我读了很多关于如何使用广度优先搜索、dfs、A*等的stackoverflow问题，问题是什么是最佳用法以及如何在现实与模拟图中实现它。例如，

考虑一下您有 Twitter/Facebook/某些社交网站的社交图，对我来说，搜索算法似乎会按如下方式工作：

如果用户 A 有 10 个朋友，那么其中一个有 2 个朋友，另一个有 3 个。搜索将首先找出用户 A 的朋友是谁，然后它必须查找这 10 个用户中每个用户的朋友在哪里。对我来说这似乎是bfs？

但是，我不确定这是否是实现该算法的方法。

谢谢，

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

意中人 2024-10-15 10:15:42

对于我的两分钱，如果您只是想遍历整个图，那么使用什么算法并不重要，只要它只命中每个节点一次即可。当您注意到时，这似乎就是您所说的：

我只是想遍历整个图表

这意味着您的术语在技术上存在缺陷 - 您正在谈论行走图表，而不是搜索图表。除非您实际上正在尝试搜索特定的东西，而您似乎在问题中根本没有提及。

话虽如此，Facebook 和 Twitter 是非常不同的图结构，确实会影响您如何行走它们：

Facebook 本质上是一个无向图。如果 X 是 Y 的朋友，Y 也必须是 X 的朋友。（或者与 X 存在关系或相关，等等）。
Twitter 本质上是一个有向图。如果你 X 关注 Y，Y 就不必关注 X。

这些问题将显着影响图行走算法。说实话，如果你只想访问所有节点，你还需要图吗？为什么不直接迭代所有这些呢？如果您在某个可迭代的数据结构 MY_DATA 中拥有所有节点，则可以有一个如下所示的生成器表达式：

def nodeGenerator(MY_DATA)
    for node in MY_DATA:
        yield node

显然，您需要调整 nodeGenerator 内部结构来处理实际访问节点的方式。话虽如此，大多数图结构都实现了节点迭代器。然后，您可以随时通过以下方式创建一个迭代器：

 for node in nodeGenerator(MY_DATA):
     (Do something here)

也许我只是错过了问题的要点，但目前您已经提出了一个关于搜索算法的问题，但没有搜索问题。由于优化和搜索的没有免费的午餐本质，任何搜索算法的价值都将完全由取决于您要检查的搜索问题。

即使在相同的数据集中也是如此。毕竟，如果您要搜索名字以字母 D 开头的每个人，一个很好的方法就是按字母顺序对每个人进行排序并进行二分搜索。相反，如果您想找到每个人与凯文·培根的分离程度，您将需要一个从培根先生开始并递归迭代每个认识他的人和他们认识的人的算法。这些都是你可以在 Facebook 或 Twitter 上做的事情，但如果没有任何具体细节，实际上没有办法推荐算法。因此，如果您什么都不知道，只需将每个人作为列表进行迭代即可。它和其他东西一样好。如果您想优化，请缓存所有计算。

For my two cents, if you're just trying to traverse the whole graph it doesn't matter a whole lot what algorithm you use so long as it only hits each node once. This seems to be what you're saying when you note:

I'm just trying to traverse the whole graph

This means your terminology is technically flawed- you're talking about walking a graph, not searching a graph. Unless you're actually trying to search for something in particular, which you don't seem to mention in the question at all.

With that said, Facebook and Twitter are very different graph structures that do have an impact on how you walk them:

Facebook is fundamentally an undirected graph. If X is friends with Y, Y MUST be friends with X. (Or in a relationship with, or related to, etc).
Twitter is fundamentally a directed graph. If you X follows Y, Y does not have to follow X.

These issues will significantly impact the graph walking algorithm. To be quite honest, if you just want to visit all the nodes, do you even need a graph? Why not just iterate over all of them? If you have all the nodes in some data structure MY_DATA that is iterable, you could just have a generator expression like this:

def nodeGenerator(MY_DATA)
    for node in MY_DATA:
        yield node

Clearly, you'd need to adjust the nodeGenerator internals to handle how you're actually accessing the nodes. With that said, most graph structures implement a node iterator. Then you can just create an iterator anytime you want to do things via:

 for node in nodeGenerator(MY_DATA):
     (Do something here)

Maybe I'm just missing the point of the question here, but at present you've posed a question about search algorithms without a search problem. Due to the No Free Lunch nature of optimization and search, the worth of any search algorithm will be entirely dependent on the search problem you're trying to examine.

This is true even among the same data set. After all, if you were searching for everybody whose name starts with the letter D, a great approach would be to just sort everyone alphabetically and do a binary search. If instead you're trying to find everyone's degree of separation from Kevin Bacon, you're going to want and algorithm that starts with Mr. Bacon and recursively iterates over everyone who knows him and everyone who they know. These are both things you COULD do on Facebook or Twitter, but without any specifics there's really no way to recommend an algorithm. Hence, if you know nothing, just iterate over everyone as a list. It's just as good as anything else. If you then want to optimize, cache any calculations.

回复收藏 0 原文