计算邻接矩阵并找到不同节点类型/类的最近的邻居
在描述问题之前,我会总结我在寻找的我的想法。 i think 我需要一种最近邻居搜索的方法,该方法受python中的节点类型限制(在我的情况下,节点代表一个原子,节点类型代表原子的元素)。因此,仅返回给定类型的最近邻居。也许我在错误地措辞我的问题。我找不到任何现有方法。
我正在编写一些环统计代码,以找到用于分子动力学模拟数据的不同类型的环。输入数据结构是原子ID,原子类型和XYZ位置的大型数组。
例如。
目前我仅考虑单元素系统(例如石墨烯,因此仅存在碳原子)。因此,在找到其最近的邻居并计算邻接矩阵时,每个节点都被认为是相同的类型。
为此,我使用kdtree和scipy.spatial算法,并从任何给定的原子中找到键长r内的所有原子。如果原子在内部,则给定原子的r半径,我认为它已连接,然后填充并相应地更新邻接字典。
def create_adjacency_dict(data, r, leaf_size=5, box_size=None):
from scipy.spatial import KDTree
tree = KDTree(data, leafsize=leaf_size,
boxsize=box_size)
all_nn_indices = tree.query_ball_point(data, r, workers=5) # Calculates neighbours within radius r of a point.
adj_dict = {}
for count, item in enumerate(all_nn_indices):
adj_dict[count] = item # Populate adjacency dictionary
for node, nodes in adj_dict.items():
if node in nodes:
nodes.remove(node) # Remove duplicates
adj_dict = {k: set(v) for k, v in adj_dict.items()}
return adj_dict
我想扩展代码以处理多物种系统。例如Ab
A
/ \
B B
显而易见的方法是仅采用蛮力的欧几里得方法。我的想法是输入分子的键类型,因此,对于AB
我想我仍然可以使用我的当前方法,但是只需搜索给定原子的10个最近的邻居,然后按照与上述相同的方法对每个原子对进行欧几里得搜索。似乎仍然存在更好的方法。
对于这种类型的问题已经存在更好的方法吗?查找受节点类型限制的最近的邻居?或者,也许有人知道我的问题更正确的措辞,这是我认为这里的一个问题。
Before I describe my problem I'll summarise what I think I'm looking for. I think I need a method for nearest-neighbor searches which are restricted by node type in python (In my case a node represents an atom and the node type represents the element the atom is). So only returning the nearest neighbors of a given type. Maybe I am wording my problem incorrectly. I haven't been able to find any existing methods for this.
I am writing some ring statistics code to find different types of rings for molecular dynamics simulation data. The input data structure is a big array of atom id, atom type, and XYZ positions.
For example.
At the moment I only consider single-element systems (for example graphene, so only Carbon atoms are present). So each node is considered the same type when finding its nearest neighbors and calculating the adjacency matrix.
For this, I am using KDTree and scipy.spatial algorithms and find all atoms within the bond length, r, from any given atom. If an atom is within, r radius of a given atom I consider it connected and then populate and update an adjacency dictionary accordingly.
def create_adjacency_dict(data, r, leaf_size=5, box_size=None):
from scipy.spatial import KDTree
tree = KDTree(data, leafsize=leaf_size,
boxsize=box_size)
all_nn_indices = tree.query_ball_point(data, r, workers=5) # Calculates neighbours within radius r of a point.
adj_dict = {}
for count, item in enumerate(all_nn_indices):
adj_dict[count] = item # Populate adjacency dictionary
for node, nodes in adj_dict.items():
if node in nodes:
nodes.remove(node) # Remove duplicates
adj_dict = {k: set(v) for k, v in adj_dict.items()}
return adj_dict
I would like to expand the code to deal with multi-species systems. For example AB2, AB2C4 etc (Where A,B and C represent different atomic species). However, I am struggling to figure out a nice way to do this.
A
/ \
B B
The obvious method would be to just do a brute force Euclidean approach. My idea is to input the bond types for a molecule, so for AB2 (shown above), you would input something like AB to indicate the different types of bonds to consider, and then the respective bond lengths. Then loop over each atom finding the distance to all other atoms and, for this example of AB2, if an atom of type A is within the bond length of an atom B, consider them connected and populate the adjacency matrix. However, I'd like to be able to use the code on large datasets of 50,000+ atoms, so this method seems wasteful.
I suppose I could still use my current method, but just search for say the 10 nearest neighbors of a given atom and then do a Euclidean search for each atom pair, following the same approach as above. Still seems like a better method would already exist though.
Do better methods already exist for this type of problem? Finding nearest neighbors restricted by node type? Or maybe someone knows a more correct wording of my problem, which is I think one of my issues here.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
“然后搜索数据。”
这听起来像是那个旧的动画片,有人指向中间带有一个微小标签的幽默的复杂图,上面写着“在这里发生奇迹”
认真的说,我猜想这是您需要的搜索是您需要的 依次优化(您不准确地说)
,这表明您正在通过每个原子进行线性搜索并计算每个原子的距离。可以吗?
这个问题有一个标准答案,称为OCTREE。
https://en.wikipedia.org/wiki/octree/octree 美元代码'戏剧化了这种方法的优势 https://www.netflix.com/title/81074012
"Then search the data."
This sounds like that old cartoon where someone points to a humourously complex diagram with a tiny label in the middle that says "Here a miracle happens"
Seriously, I am guessing that this searching is what you need to optimize ( you do not exactly say )
In turn, this suggests that you are doing a linear search through every atom and calculating the distance of each. Could it be so!?
There is a standard answer for this problem, called an octree.
https://en.wikipedia.org/wiki/Octree
A netflix tv miniseries 'The Billion Dollar Code' dramatizes the advantages of this approach https://www.netflix.com/title/81074012