为什么聚类系数与我的程序和 igraph R 库不同?
我只是用 C++ 编写一个程序,计算 dot 格式的无向图的聚类系数 [CC](局部和全局)。我的问题是我的程序的结果与 R 的输出(使用 igraph 库)不匹配:
我的程序:
The cluster coefficient of "0" is: 0.257 (88/342)
The cluster coefficient of "1" is: 0.444 (40/90)
The cluster coefficient of "10" is: 1.000 (2/2)
The cluster coefficient of "2" is: 0.418 (46/110)
The cluster coefficient of "11" is: 1.000 (2/2)
The cluster coefficient of "12" is: 0.667 (8/12)
The cluster coefficient of "3" is: 0.346 (54/156)
The cluster coefficient of "5" is: 0.571 (24/42)
The cluster coefficient of "13" is: 1.000 (12/12)
The cluster coefficient of "4" is: 0.607 (34/56)
The cluster coefficient of "7" is: 0.679 (38/56)
The cluster coefficient of "14" is: 1.000 (6/6)
The cluster coefficient of "15" is: 0.833 (10/12)
The cluster coefficient of "16" is: 1.000 (6/6)
The cluster coefficient of "17" is: 0.733 (22/30)
The cluster coefficient of "9" is: 0.833 (10/12)
The cluster coefficient of "18" is: 0.714 (30/42)
The cluster coefficient of "19" is: 1.000 (6/6)
The cluster coefficient of "6" is: 1.000 (2/2)
The cluster coefficient of "8" is: 0.733 (22/30)
其中“”是 >图的节点和 (n/m) 数字是“其邻域内顶点之间的链接”(n) 和“它们之间可能存在的链接数量” (m) 分别 (来自维基百科的描述) R 的输出:
0 0.2631579 x (+2 links)
1 0.4666667 x (+2 links)
2 0.4181818
3 0.3461538
4 0.6071429
5 0.6190476 x (+2 links)
6 1.0000000
7 0.6785714
8 0.6666667 x (-2 links)
9 0.8000000
10 1.0000000
11 1.0000000
12 0.6666667
13 1.0000000
14 1.0000000
15 0.8333333
16 1.0000000
17 0.7333333
18 0.7142857
19 1.0000000
其中每行中的第一个数字是 Node,第二个数字是本地 CC,第三个数字是我的注释(当它不匹配时)我的输出(指定我需要添加/删除以匹配 R 的输出的链接数量 (n))。
我遇到的第二个问题是来自 R 的全局 CC 与我的定义或维基百科不匹配(除非我误解了公式)。此图的 R 输出是 0.458891,我的是 0.742
所以我手动完成了:我计算了 8 的 CC strong> 并匹配我的程序的输出。所以我的问题是“igraph 库是否有可能存在错误?”如果答案是“否”:“我缺少什么?”
图形文件是这样的:
graph {
1 -- 0;
10 -- 0;
10 -- 2;
11 -- 0;
11 -- 2;
12 -- 0;
12 -- 1;
12 -- 3;
12 -- 5;
13 -- 0;
13 -- 3;
13 -- 4;
13 -- 7;
14 -- 0;
14 -- 1;
14 -- 4;
15 -- 0;
15 -- 2;
15 -- 3;
16 -- 0;
16 -- 15;
16 -- 3;
17 -- 0;
17 -- 1;
17 -- 2;
17 -- 5;
17 -- 7;
17 -- 9;
18 -- 0;
18 -- 1;
18 -- 2;
18 -- 3;
18 -- 4;
18 -- 7;
19 -- 0;
19 -- 18;
19 -- 3;
2 -- 0;
2 -- 1;
3 -- 0;
3 -- 2;
4 -- 0;
4 -- 1;
4 -- 3;
5 -- 0;
5 -- 2;
5 -- 3;
6 -- 0;
6 -- 3;
7 -- 0;
7 -- 1;
7 -- 2;
7 -- 3;
7 -- 4;
8 -- 0;
8 -- 1;
8 -- 2;
8 -- 3;
8 -- 4;
8 -- 5;
9 -- 0;
9 -- 1;
9 -- 5;
}
我用R计算CC的方式是将图形加载到(或生成一个新的,因为它无法读取点文件)到例如,var“f”,并对全局CC执行transitivity(f),对于本地transitivity(f, "local")一。
非常感谢您的阅读,并对我的英语不好表示歉意。
I'm just coding a program in C++ that calculates the clustering coefficient [CC] (local and global) of an undirected graph in dot format. My problem is that the result of my program doesn't match the output from R (with igraph library):
My program:
The cluster coefficient of "0" is: 0.257 (88/342)
The cluster coefficient of "1" is: 0.444 (40/90)
The cluster coefficient of "10" is: 1.000 (2/2)
The cluster coefficient of "2" is: 0.418 (46/110)
The cluster coefficient of "11" is: 1.000 (2/2)
The cluster coefficient of "12" is: 0.667 (8/12)
The cluster coefficient of "3" is: 0.346 (54/156)
The cluster coefficient of "5" is: 0.571 (24/42)
The cluster coefficient of "13" is: 1.000 (12/12)
The cluster coefficient of "4" is: 0.607 (34/56)
The cluster coefficient of "7" is: 0.679 (38/56)
The cluster coefficient of "14" is: 1.000 (6/6)
The cluster coefficient of "15" is: 0.833 (10/12)
The cluster coefficient of "16" is: 1.000 (6/6)
The cluster coefficient of "17" is: 0.733 (22/30)
The cluster coefficient of "9" is: 0.833 (10/12)
The cluster coefficient of "18" is: 0.714 (30/42)
The cluster coefficient of "19" is: 1.000 (6/6)
The cluster coefficient of "6" is: 1.000 (2/2)
The cluster coefficient of "8" is: 0.733 (22/30)
Where the ""'s are the Nodes of the graph and the (n/m) numbers are "the links between the vertices within its neighborhood" (n) and "the number of links that could possibly exist between them" (m) respectively (description from Wikipedia)
And the output from R:
0 0.2631579 x (+2 links)
1 0.4666667 x (+2 links)
2 0.4181818
3 0.3461538
4 0.6071429
5 0.6190476 x (+2 links)
6 1.0000000
7 0.6785714
8 0.6666667 x (-2 links)
9 0.8000000
10 1.0000000
11 1.0000000
12 0.6666667
13 1.0000000
14 1.0000000
15 0.8333333
16 1.0000000
17 0.7333333
18 0.7142857
19 1.0000000
Where the first number in each row is the Node, the second is it's local CC and the third one is my annotation when it doesn't match my output (specifying the number of links (n) I need to add/remove to match R's output).
The second problem I have is that the global CC from R does not match my definition or the Wikipedia's (unless I have misunderstood the formula). The output from R for this graph is 0.458891 and mine is 0.742
So I did it manually: I calculated the 8's CC and matches my program's output. So my question is that "is even possible that igraph library have a bug?" and if the answer is "no": "what I'm missing?"
The graph file is this one:
graph {
1 -- 0;
10 -- 0;
10 -- 2;
11 -- 0;
11 -- 2;
12 -- 0;
12 -- 1;
12 -- 3;
12 -- 5;
13 -- 0;
13 -- 3;
13 -- 4;
13 -- 7;
14 -- 0;
14 -- 1;
14 -- 4;
15 -- 0;
15 -- 2;
15 -- 3;
16 -- 0;
16 -- 15;
16 -- 3;
17 -- 0;
17 -- 1;
17 -- 2;
17 -- 5;
17 -- 7;
17 -- 9;
18 -- 0;
18 -- 1;
18 -- 2;
18 -- 3;
18 -- 4;
18 -- 7;
19 -- 0;
19 -- 18;
19 -- 3;
2 -- 0;
2 -- 1;
3 -- 0;
3 -- 2;
4 -- 0;
4 -- 1;
4 -- 3;
5 -- 0;
5 -- 2;
5 -- 3;
6 -- 0;
6 -- 3;
7 -- 0;
7 -- 1;
7 -- 2;
7 -- 3;
7 -- 4;
8 -- 0;
8 -- 1;
8 -- 2;
8 -- 3;
8 -- 4;
8 -- 5;
9 -- 0;
9 -- 1;
9 -- 5;
}
The way I calculated the CC with R is loading the graph (or generating a new one, because it can't read dot files) into a var "f", for example, and executing transitivity(f) for global CC and transitivity(f, "local") for local one.
Thanks a lot for reading and sorry for my bad English.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
igraph 的作者之一在这里。
我刚刚将您的图表加载到 igraph(Python 接口)中,其结果与您的结果匹配到最后一位数字。您使用的是哪个版本的 igraph?
至于“全局”聚类系数,请注意,至少有两个相互冲突的定义:
计算整个网络中三角形的数量,并将其除以可能的三角形的数量。这是“真正的”全局聚类系数,igraph默认计算这个。
计算每个节点的局部聚类系数并取平均值。这是“平均局部”聚类系数,您正在计算它。
One of the authors of igraph here.
I have just loaded your graph into igraph (the Python interface) and its results match yours to the last digit. Which version of igraph you are using?
As for the "global" clustering coefficient, note that there are at least two conflicting definitions:
Calculating the number of triangles in the entire network and dividing it by the number of possible triangles. This is the "real" global clustering coefficient, and igraph calculates this by default.
Calculating the local clustering coefficients for each node and taking the average. This is the "average local" clustering coefficient, and you are calculating this.