节点在树中的位置作为特征向量?
背景 我有一棵节点树,我正在尝试运行一些机器学习算法来对它们进行分类。我想要使用的功能之一是节点在树中的位置,即较近的节点可能属于同一类。
我的问题 我将所有特征表示为数字向量。关于如何将树中的位置表示为向量有什么想法吗?那么两个向量的距离 b/n 对应于树中节点之间的距离? (我有一棵深度约为 5-7 左右、分支约为 2-3 的小树)
我尝试过的 PS 我读到了有关查找 2 个节点之间最短距离的算法(查找每个节点到其最近的共同祖先的距离),我发现的一个想法是拥有一个向量 x,其中每个索引对应于树中可能的祖先。然后设置 x[i] = 从该祖先开始的级别数。问题是 - 我不知道如何处理不是祖先的节点。
Background
I have a tree of nodes and I'm trying to run some machine learning algorithms to classify them. One of the features I want to use is the position of the nodes in the tree, i.e. closer nodes are likely to be in the same class.
My Problem
I'm representing all the features as a vector of numbers. Any thoughts on how I can represent position in the tree as a vector? So that distance b/n two vectors corresponds to distance between nodes in the tree? (I have a small tree of depth around 5-7 and branching around 2-3)
What I tried
P.S. I read about algorithms to find shortest distance between 2 nodes (finding each one's distance to their closest common ancestor) One idea I found was to have a vector x where each index corresponds to possible ancestors in the tree. Then set x[i] = numbers of levels from that ancestor. The problem with that is- I don't know what to do with nodes that aren't ancestors.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
只需将树的路径作为向量即可。然后简单计算两条路径之间的长度差。例如。 2,3,1,5,3 是一条路径。 2,3,3,5,9,5 是另一条路径。所以 2,3 它们有共同点。所以差的长度是 1,5,3 和 3,5,9,5,即 7。祝你好运
just put the path of the tree as the vector. then simply calculate the length of the difference between the two paths. so for example. 2,3,1,5,3 is one path. and 2,3,3,5,9,5 is another path. so 2,3 they have in common. so the length of the difference is 1,5,3 and 3,5,9,5 which is 7. good luck
所以实际上有一个非常好的方法来导出你想要的特征;您可以使用 MDS 来执行此操作。
MDS 的作用是,它需要一个 N × N 矩阵(这里 N 是节点数),其中条目 a_{i,j} 是项目 i 和项目 j(节点 i 和节点 j)之间的距离,并且对于每个项目 i将返回一个 D(预先指定的)位置向量 D_i,使得 D_i 和 D_j 之间的距离约为 a_{i,j}。
因此,我们可以通过一些预处理来获得特征向量。首先,找到每对节点的最短距离(以跳为单位)(您可以使用 Floyd-Warshall)然后使用距离矩阵作为 MDS 的输入并指定位置向量的维度数,MDS 将输出所述维度的位置向量。
如果您搜索网络,我相信您可以找到 Floyd-Warshall 和 MDS 的开源实现。
So there is actually a very nice way to derive the features that you want; you can do so with MDS.
What MDS does is that it takes a N by N matrix (here N is number of nodes) where entry a_{i,j} is the distance between item i and item j (node i and node j) and for each item i it will return a D (pre-specified) position vector, D_i, such that the distance between D_i and D_j is approximately a_{i,j}.
Thus, we can have your feature vector with a bit of pre-processing. First, find the shortest distance (in hops) for each pair of nodes (you could use Floyd-Warshall) then use the distance matrix as input for MDS and specify the number of dimensions for you're position vector, and MDS will output position vectors of said dimensions.
If you search the web I'm sure you can find open sourced implementations to both Floyd-Warshall and MDS.