如何识别网络中的节点簇

发布于 2024-11-09 03:50:09 字数 602 浏览 0 评论 0原文

我有一个表描述了几组连接的节点:

node
origin_node REFERENCES node
start_time
end_time

并且我想找出数据集包含多少个集群,例如如果记录是:

A, B, 10:00, 11:00
B, C, 9:00, 9:15
D, E, 10:00, 10:15
B, A, 13:00, 13:30
E, B, 12:00, 13:20
F, G, 9:00, 9:15

...那么我将有 2 个集群 {A,B,C,D,E } 和 {F,G}

(时间几乎无关紧要 - 它只是为了证明 node+origin_node 不一定是唯一/有序的)。

但我在制定一种从几千行中识别簇的算法时有点陷入困境。

我正在使用 MySQL 5.0.22 - 所以没有“CONNECT BY”,并且可以访问 PHP 和 awk - 尽管对我来说理解算法比理解编码解决方案更容易。只要分析数据的时间不超过几个小时,我就会倾向于简单而不是顺序。

顺便说一句:这是一个现实世界的问题 - 不是家庭作业(我很久以前就不再是学生了 - 也许太早了;)

TIA

I have a table describing several sets of connected nodes:

node
origin_node REFERENCES node
start_time
end_time

and I want to find out how many clusters the dataset contains, e.g. if the records were:

A, B, 10:00, 11:00
B, C, 9:00, 9:15
D, E, 10:00, 10:15
B, A, 13:00, 13:30
E, B, 12:00, 13:20
F, G, 9:00, 9:15

...then I'd have 2 clusters {A,B,C,D,E} and {F,G}

(the times are pretty much irrelevant - it's just there to demonstrate that node+origin_node is not necessarily unique / ordered).

But I'm a bit stuck in working out an algorithm which identifies the clusters from a few thousand rows.

I'm working with MySQL 5.0.22 - so no 'CONNECT BY', and have access to PHP and awk - although it'd be easier for me to understand an algorithm rather than a coded solution. And as long as it takes less than a couple of hours to analyse the data, I'd lean to simplicity over order.

BTW: its a real-world problem - not homework (I stopped being a student a long time ago - perhaps too early ;)

TIA

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

泪之魂 2024-11-16 03:50:09

对我来说,理解算法比理解编码解决方案更容易

尝试这些链接吗?

http://en.wikipedia.org/wiki/Cluster_analysis

http://en.wikipedia.org/wiki/Category:Data_clustering_algorithms

另外,虽然不是 MySQL,但 Microsoft 上也有一些东西站点:

http://msdn.microsoft.com/en-us/library/ms174879 .aspx


根据您的评论进行编辑:

在您的特定情况下,类似于创建闭包表的操作可能会起作用。

使用临时表...

从任意节点开始。将其分配给新集群。

下一个节点。是否存在从当前识别的集群到节点的链接?

  • 如果不是,则将其分配给新的集群。

  • 如果是,则将其分配给该集群。然后,对于每个链接,验证已处理的节点是否位于同一集群中。如果不是,请将它们重新分配到该集群。

it'd be easier for me to understand an algorithm rather than a coded solution

Tried these links?

http://en.wikipedia.org/wiki/Cluster_analysis

http://en.wikipedia.org/wiki/Category:Data_clustering_algorithms

Also, though not MySQL, there also is stuff on Microsoft's site:

http://msdn.microsoft.com/en-us/library/ms174879.aspx


Edit, per your comment:

In your particular case, something akin to creating a closure table might work.

Using a temporary table...

Start with an arbitrary node. Assign it to a new cluster.

Next node. Is there a link to a node from a currently identified cluster?

  • If no, assign it to a new cluster.

  • If yes, assign it to that cluster. Then, for each link, verify that already processed node is in the same cluster. If not, reassign them to that cluster.

傲影 2024-11-16 03:50:09

遍历网络并标记访问过的节点(类似于垃圾收集算法)。它相当高效,但需要相当多的代码。

Went with walking the network and flagging visited nodes (similar to garbage collection algorithms). Its reasonably efficient but needed quite a bit of code.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文