计算邻接矩阵并找到不同节点类型/类的最近的邻居

发布于 2025-02-11 02:26:36 字数 1889 浏览 2 评论 0原文

在描述问题之前，我会总结我在寻找的我的想法。 i think 我需要一种最近邻居搜索的方法，该方法受python中的节点类型限制（在我的情况下，节点代表一个原子，节点类型代表原子的元素）。因此，仅返回给定类型的最近邻居。也许我在错误地措辞我的问题。我找不到任何现有方法。

我正在编写一些环统计代码，以找到用于分子动力学模拟数据的不同类型的环。输入数据结构是原子ID，原子类型和XYZ位置的大型数组。

例如。

目前我仅考虑单元素系统（例如石墨烯，因此仅存在碳原子）。因此，在找到其最近的邻居并计算邻接矩阵时，每个节点都被认为是相同的类型。

为此，我使用kdtree和scipy.spatial算法，并从任何给定的原子中找到键长r内的所有原子。如果原子在内部，则给定原子的r半径，我认为它已连接，然后填充并相应地更新邻接字典。

def create_adjacency_dict(data, r, leaf_size=5, box_size=None):
    from scipy.spatial import KDTree
    tree = KDTree(data, leafsize=leaf_size,
                  boxsize=box_size)  
    all_nn_indices = tree.query_ball_point(data, r, workers=5)  # Calculates neighbours within radius r of a point.
    adj_dict = {}
    for count, item in enumerate(all_nn_indices):
        adj_dict[count] = item  # Populate adjacency dictionary

    for node, nodes in adj_dict.items():
        if node in nodes:
            nodes.remove(node)  # Remove duplicates

    adj_dict = {k: set(v) for k, v in adj_dict.items()}

    return adj_dict

我想扩展代码以处理多物种系统。例如Ab2 ，Ab2 c ₄等（其中a，b和c代表不同的原子种）。但是，我正在努力找出一种很好的方法。

                                             A
                                            / \
                                           B   B

显而易见的方法是仅采用蛮力的欧几里得方法。我的想法是输入分子的键类型，因此，对于AB2 （如上所述），您将输入AB之类的东西以指示要考虑的不同类型的键，然后指示相应的键长度。。然后在每个原子上循环循环找到与所有其他原子的距离，对于Ab2 的示例，如果A型原子在原子B的键长内，请考虑它们连接并填充它们邻接矩阵。但是，我希望能够在50,000多种原子的大数据集上使用代码，因此这种方法似乎很浪费。

我想我仍然可以使用我的当前方法，但是只需搜索给定原子的10个最近的邻居，然后按照与上述相同的方法对每个原子对进行欧几里得搜索。似乎仍然存在更好的方法。

对于这种类型的问题已经存在更好的方法吗？查找受节点类型限制的最近的邻居？或者，也许有人知道我的问题更正确的措辞，这是我认为这里的一个问题。

原文

Before I describe my problem I'll summarise what I think I'm looking for. I think I need a method for nearest-neighbor searches which are restricted by node type in python (In my case a node represents an atom and the node type represents the element the atom is). So only returning the nearest neighbors of a given type. Maybe I am wording my problem incorrectly. I haven't been able to find any existing methods for this.

I am writing some ring statistics code to find different types of rings for molecular dynamics simulation data. The input data structure is a big array of atom id, atom type, and XYZ positions.

For example.

At the moment I only consider single-element systems (for example graphene, so only Carbon atoms are present). So each node is considered the same type when finding its nearest neighbors and calculating the adjacency matrix.

For this, I am using KDTree and scipy.spatial algorithms and find all atoms within the bond length, r, from any given atom. If an atom is within, r radius of a given atom I consider it connected and then populate and update an adjacency dictionary accordingly.

def create_adjacency_dict(data, r, leaf_size=5, box_size=None):
    from scipy.spatial import KDTree
    tree = KDTree(data, leafsize=leaf_size,
                  boxsize=box_size)  
    all_nn_indices = tree.query_ball_point(data, r, workers=5)  # Calculates neighbours within radius r of a point.
    adj_dict = {}
    for count, item in enumerate(all_nn_indices):
        adj_dict[count] = item  # Populate adjacency dictionary

    for node, nodes in adj_dict.items():
        if node in nodes:
            nodes.remove(node)  # Remove duplicates

    adj_dict = {k: set(v) for k, v in adj_dict.items()}

    return adj_dict

I would like to expand the code to deal with multi-species systems. For example AB₂, AB₂C₄ etc (Where A,B and C represent different atomic species). However, I am struggling to figure out a nice way to do this.

                                             A
                                            / \
                                           B   B

The obvious method would be to just do a brute force Euclidean approach. My idea is to input the bond types for a molecule, so for AB₂ (shown above), you would input something like AB to indicate the different types of bonds to consider, and then the respective bond lengths. Then loop over each atom finding the distance to all other atoms and, for this example of AB₂, if an atom of type A is within the bond length of an atom B, consider them connected and populate the adjacency matrix. However, I'd like to be able to use the code on large datasets of 50,000+ atoms, so this method seems wasteful.

I suppose I could still use my current method, but just search for say the 10 nearest neighbors of a given atom and then do a Euclidean search for each atom pair, following the same approach as above. Still seems like a better method would already exist though.

Do better methods already exist for this type of problem? Finding nearest neighbors restricted by node type? Or maybe someone knows a more correct wording of my problem, which is I think one of my issues here.

分享到QQ

分享到微博