如何识别网络中的节点簇

发布于 2024-11-09 03:50:09 字数 602 浏览 0 评论 0原文

我有一个表描述了几组连接的节点：

node
origin_node REFERENCES node
start_time
end_time

并且我想找出数据集包含多少个集群，例如如果记录是：

A, B, 10:00, 11:00
B, C, 9:00, 9:15
D, E, 10:00, 10:15
B, A, 13:00, 13:30
E, B, 12:00, 13:20
F, G, 9:00, 9:15

...那么我将有 2 个集群 {A，B，C，D，E } 和 {F,G}

（时间几乎无关紧要 - 它只是为了证明 node+origin_node 不一定是唯一/有序的）。

但我在制定一种从几千行中识别簇的算法时有点陷入困境。

我正在使用 MySQL 5.0.22 - 所以没有“CONNECT BY”，并且可以访问 PHP 和 awk - 尽管对我来说理解算法比理解编码解决方案更容易。只要分析数据的时间不超过几个小时，我就会倾向于简单而不是顺序。

顺便说一句：这是一个现实世界的问题 - 不是家庭作业（我很久以前就不再是学生了 - 也许太早了；）

TIA

原文

I have a table describing several sets of connected nodes:

node
origin_node REFERENCES node
start_time
end_time

and I want to find out how many clusters the dataset contains, e.g. if the records were:

A, B, 10:00, 11:00
B, C, 9:00, 9:15
D, E, 10:00, 10:15
B, A, 13:00, 13:30
E, B, 12:00, 13:20
F, G, 9:00, 9:15

...then I'd have 2 clusters {A,B,C,D,E} and {F,G}

(the times are pretty much irrelevant - it's just there to demonstrate that node+origin_node is not necessarily unique / ordered).

But I'm a bit stuck in working out an algorithm which identifies the clusters from a few thousand rows.

I'm working with MySQL 5.0.22 - so no 'CONNECT BY', and have access to PHP and awk - although it'd be easier for me to understand an algorithm rather than a coded solution. And as long as it takes less than a couple of hours to analyse the data, I'd lean to simplicity over order.

BTW: its a real-world problem - not homework (I stopped being a student a long time ago - perhaps too early ;)

TIA

分享到QQ

分享到微博