层次图连接性指标

发布于 2024-11-25 11:09:56 字数 890 浏览 2 评论 0原文

这是我在 Stack Overflow 上的第一个问题。这并不是一个真正的编程问题,但由于我们大多数人在某些时候都必须处理理论问题,并且周围可能有一些图论专家,所以我想我可以尝试一下。

我目前正在对多语言网站进行一些研究,并在网站结构中发现了一些有趣的模式。下图是两个不同的多语言网站的网站图。抱歉,我没有足够的代表点来发布图像,因此我将它们保留为链接。我使用 Force Atlas 算法进行布局。顶点根据页面语言着色。阴影区域对应于特定语言的子图。

这是网站的图表,其中相同内容的不同语言版本紧密相连。因此,代表不同语言版本的平面是重叠的。

http://www.ai。 soc.i.kyoto-u.ac.jp/~julien/phd/images/tight.png

在第二张图中,我们有一个网站,其中网站的语言版本几乎是独立的,因此我们几乎没有重叠。

http://www.ai。 soc.i.kyoto-u.ac.jp/~julien/phd/images/loose.png

所以这是我的问题:

是否有一个具体的指标来量化这种重叠?如果是这样,它的名称是什么?

由于我使用了基于力的布局,因此语言子图之间的边数。所以我想像取子图中的边数与进入/离开特定子图的边数之比之类的东西可能会起作用。我确信我不是第一个想到这个想法的人,所以我想知道这个指标是否有一个名称。然后我可以从那里谷歌它:)

提前谢谢你!

this is my first question on Stack Overflow. This is not really a programming question but since most of us have to deal with theoretical problems at some point and there might be some graph theory specialists around, I thought I might give it a go.

I am currently doing some research on multilingual websites and I found some interesting patterns in the website structure. The graphs below are the website graphs of two different multilingual websites. Sorry, I don't have enough rep points to post images so I leave them as links. I used the Force Atlas algorithm for the layout. Vertices are colored according to the page language. The shaded areas correspond to the subgraphs of a specific language.

Here is the graph of the website where different language versions of the same content are very closely linked. Hence the planes representing the different language versions are overlapping.

http://www.ai.soc.i.kyoto-u.ac.jp/~julien/phd/images/tight.png

In this second graph, we have a website where language versions of a website are almost independent, thus we have almost no overlap.

http://www.ai.soc.i.kyoto-u.ac.jp/~julien/phd/images/loose.png

So here is my question:

Is there a specific metric to quantify this overlap? If so, what is it named?

Since I used a force-based layout, the number of edges between the language subgraphs. So I guess something like taking the ratio of the number of edges within the subgraph to the number edges going outside/coming inside a specific subgraph might do the trick. I am sure I am not the first to get this idea so I was wondering if this metric had a name. I could then Google it from there :)

Thank you in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

愿得七秒忆 2024-12-02 11:09:56

听起来您正在寻找的是网络模块化。给定一个图和一个分区(将图分解为不相交的子图),模块化定义为:

属于给定组的边的分数减去
如果边缘随机分布,则期望有这样的分数。

模块化是网络上一些最早的社区检测算法的基础,这些算法试图找到一组密集连接的节点。最近,模块化已被证明是社区检测的一个糟糕指标,尽管由于分辨率限制无法检测小群体或在某些情况下分解明确定义的群体(请参阅本文)。

It sounds like what you're looking for is Network Modularity. Given a graph, and a partition (breaking the graph into disjoint subgraphs), the modularity is defined as:

The fraction of the edges that fall within the given groups minus the
expected such fraction if edges were distributed at random.

Modularity was the basis of some of the first community detection algorithms on networks, which try to find sets of nodes that are densely connected. Recently, modularity has been shown to be a poor metric for community detection though because of resolution limits that fail to detect small groups or break apart well defined groups in certain cases (see this paper).

鱼忆七猫命九 2024-12-02 11:09:56

现在除了模块化之外还有其他方法,旨在克服工作中提到的限制,例如 surprise ;或 B 分数和 C 分数(设计为显着性指数)。

And there are now other approaches than modularity, designed to overcome the limitations mentionned by job, such as surprise; or the B- and C-scores (designed to be significance indices).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文