计算邻接矩阵并找到不同节点类型/类的最近的邻居

发布于 2025-02-11 02:26:36 字数 1889 浏览 2 评论 0原文

在描述问题之前,我会总结我在寻找的我的想法。 i think 我需要一种最近邻居搜索的方法,该方法受python中的节点类型限制(在我的情况下,节点代表一个原子,节点类型代表原子的元素)。因此,仅返回给定类型的最近邻居。也许我在错误地措辞我的问题。我找不到任何现有方法。

我正在编写一些环统计代码,以找到用于分子动力学模拟数据的不同类型的环。输入数据结构是原子ID,原子类型和XYZ位置的大型数组。

例如。

目前我仅考虑单元素系统(例如石墨烯,因此仅存在碳原子)。因此,在找到其最近的邻居并计算邻接矩阵时,每个节点都被认为是相同的类型。

为此,我使用kdtree和scipy.spatial算法,并从任何给定的原子中找到键长r内的所有原子。如果原子在内部,则给定原子的r半径,我认为它已连接,然后填充并相应地更新邻接字典。

def create_adjacency_dict(data, r, leaf_size=5, box_size=None):
    from scipy.spatial import KDTree
    tree = KDTree(data, leafsize=leaf_size,
                  boxsize=box_size)  
    all_nn_indices = tree.query_ball_point(data, r, workers=5)  # Calculates neighbours within radius r of a point.
    adj_dict = {}
    for count, item in enumerate(all_nn_indices):
        adj_dict[count] = item  # Populate adjacency dictionary

    for node, nodes in adj_dict.items():
        if node in nodes:
            nodes.remove(node)  # Remove duplicates

    adj_dict = {k: set(v) for k, v in adj_dict.items()}

    return adj_dict

我想扩展代码以处理多物种系统。例如Ab2 ,Ab2 c 4 等(其中a,b和c代表不同的原子种)。但是,我正在努力找出一种很好的方法。

                                             A
                                            / \
                                           B   B

显而易见的方法是仅采用蛮力的欧几里得方法。我的想法是输入分子的键类型,因此,对于AB2 (如上所述),您将输入AB之类的东西以指示要考虑的不同类型的键,然后指示相应的键长度。 。然后在每个原子上循环循环找到与所有其他原子的距离,对于Ab2 的示例,如果A型原子在原子B的键长内,请考虑它们连接并填充它们邻接矩阵。但是,我希望能够在50,000多种原子的大数据集上使用代码,因此这种方法似乎很浪费。

我想我仍然可以使用我的当前方法,但是只需搜索给定原子的10个最近的邻居,然后按照与上述相同的方法对每个原子对进行欧几里得搜索。似乎仍然存在更好的方法。

对于这种类型的问题已经存在更好的方法吗?查找受节点类型限制的最近的邻居?或者,也许有人知道我的问题更正确的措辞,这是我认为这里的一个问题。

Before I describe my problem I'll summarise what I think I'm looking for. I think I need a method for nearest-neighbor searches which are restricted by node type in python (In my case a node represents an atom and the node type represents the element the atom is). So only returning the nearest neighbors of a given type. Maybe I am wording my problem incorrectly. I haven't been able to find any existing methods for this.

I am writing some ring statistics code to find different types of rings for molecular dynamics simulation data. The input data structure is a big array of atom id, atom type, and XYZ positions.

For example.

Example of input data

At the moment I only consider single-element systems (for example graphene, so only Carbon atoms are present). So each node is considered the same type when finding its nearest neighbors and calculating the adjacency matrix.

For this, I am using KDTree and scipy.spatial algorithms and find all atoms within the bond length, r, from any given atom. If an atom is within, r radius of a given atom I consider it connected and then populate and update an adjacency dictionary accordingly.

def create_adjacency_dict(data, r, leaf_size=5, box_size=None):
    from scipy.spatial import KDTree
    tree = KDTree(data, leafsize=leaf_size,
                  boxsize=box_size)  
    all_nn_indices = tree.query_ball_point(data, r, workers=5)  # Calculates neighbours within radius r of a point.
    adj_dict = {}
    for count, item in enumerate(all_nn_indices):
        adj_dict[count] = item  # Populate adjacency dictionary

    for node, nodes in adj_dict.items():
        if node in nodes:
            nodes.remove(node)  # Remove duplicates

    adj_dict = {k: set(v) for k, v in adj_dict.items()}

    return adj_dict

I would like to expand the code to deal with multi-species systems. For example AB2, AB2C4 etc (Where A,B and C represent different atomic species). However, I am struggling to figure out a nice way to do this.

                                             A
                                            / \
                                           B   B

The obvious method would be to just do a brute force Euclidean approach. My idea is to input the bond types for a molecule, so for AB2 (shown above), you would input something like AB to indicate the different types of bonds to consider, and then the respective bond lengths. Then loop over each atom finding the distance to all other atoms and, for this example of AB2, if an atom of type A is within the bond length of an atom B, consider them connected and populate the adjacency matrix. However, I'd like to be able to use the code on large datasets of 50,000+ atoms, so this method seems wasteful.

I suppose I could still use my current method, but just search for say the 10 nearest neighbors of a given atom and then do a Euclidean search for each atom pair, following the same approach as above. Still seems like a better method would already exist though.

Do better methods already exist for this type of problem? Finding nearest neighbors restricted by node type? Or maybe someone knows a more correct wording of my problem, which is I think one of my issues here.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

千笙结 2025-02-18 02:26:36

“然后搜索数据。”

这听起来像是那个旧的动画片,有人指向中间带有一个微小标签的幽默的复杂图,上面写着“在这里发生奇迹”

“在此处输入映像说明”

认真的说,我猜想这是您需要的搜索是您需要的 依次优化(您不准确地说)

,这表明您正在通过每个原子进行线性搜索并计算每个原子的距离。可以吗?

这个问题有一个标准答案,称为OCTREE。

https://en.wikipedia.org/wiki/octree/octree 美元代码'戏剧化了这种方法的优势 https://www.netflix.com/title/81074012

"Then search the data."

This sounds like that old cartoon where someone points to a humourously complex diagram with a tiny label in the middle that says "Here a miracle happens"

enter image description here

Seriously, I am guessing that this searching is what you need to optimize ( you do not exactly say )

In turn, this suggests that you are doing a linear search through every atom and calculating the distance of each. Could it be so!?

There is a standard answer for this problem, called an octree.

https://en.wikipedia.org/wiki/Octree

A netflix tv miniseries 'The Billion Dollar Code' dramatizes the advantages of this approach https://www.netflix.com/title/81074012

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文